Bug ID: JDK-6214519 3.8: Identifier equivalence does not consider Character.isIdentifierIgnorable

Type: Bug
Component: specification
Sub-Component: language
Affected Version: 6

Priority: P5
Status: Closed
Resolution: Fixed

Submitted: 2005-01-06
Updated: 2018-08-03
Resolved: 2017-11-21

JDK 10
10Fixed

The specification of java.lang.Character.isIdentifierIgnorable disagrees
with the JLS.  This is from isIdentifierIgnorable:

"Determines if the specified character should be regarded as an ignorable
character in a Java identifier ... "

This is from JLS:

"Two identifiers are the same only if they are identical, that is, have 
the same Unicode character for each letter or digit."

Also see http://forum.java.sun.com/thread.jspa?forumID=316&threadID=583420

###@###.### 2005-1-06 01:28:12 GMT

3.8 should say: "Two identifiers are the same only if, *****after ignoring characters that are ignorable,***** the identifiers have the same Unicode character for each letter or digit. *****An ignorable character is a character for which the method Character.isIdentifierIgnorable(int) returns true.***** Identifiers that have the same external appearance may yet be different."

21-11-2017

The Character api spec for "isIdentifierIgnorable()" says "Determines if the specified character should be regarded as an ignorable character in a Java identifier or a Unicode identifier". It never says that the ignorable characters are not allowed in the java identifiers. On the other hand the JLS nowhere talks about the Ignorable characters and simply says "Two identifiers are the same only if they are identical, that is, have the same Unicode character for each letter or digit". The JLS statement contradicts the actual behaviour e.g. I tried few cases char c\u0000; char c; // gives error as variable 'c' is already defined char c\u001a = 'a'; char c; // gives error as variable 'c' is already defined In above cases the identifier "c\u0000" and "c\u001a" will reduce down to "c" (\u0000 and '\u001a' being ignorable character), hence another declaration of "char c;" gives compiler error. Here two identifiers (char c\u0000; and char c;) have different unicode code point(s), but they are treated similar because of ignorable code point, this contradicts the above JLS statement. Also, I think the issue is not related to the combining marks. I think the JLS needs to add the clarification about the ignorable characters, something like this ------------------------------ Ignorable characters are the Unicode characters which are ignored from the java identifiers. The two java identifiers are compared ignoring the characters which fall under ignorable category i.e. a character for which the method Character.isIdentifierIgnorable(int) returns true ------------------------------

26-09-2016

EVALUATION It seems that isIdentifierIgnorable was part of a dream where Java identifiers would align with Unicode text comparison, where different char sequences may express the same text through composition or decomposition (e.g., "��" <-> "e\u0301"), reordering of combining marks, or removal of ignorable characters. In reality, this never got implemented, and a Java identifier is simply a sequence of chars. If two char sequences are different, they're different identifiers, as documented in JLS section 3.8. We should clarify that ignorable characters are not in fact ignored, but are simply one of several sets of characters that are allowed to be present in Java identifiers. *** (#1 of 1): [ UNSAVED ] ###@###.###

27-07-2005