JDK-6185419 : Unicode behavior change in Character.isLetter() post mantis
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 5.0,6
  • Priority: P3
  • Status: Closed
  • Resolution: Not an Issue
  • OS: solaris_9
  • CPU: generic,sparc
  • Submitted: 2004-10-27
  • Updated: 2010-04-02
  • Resolved: 2004-10-28
Related Reports
Duplicate :  
Relates :  
Description
A major Sun customer, 
cannot move to Tiger because of a change in behavior in Character.isLetter()

Using the code snippet provided by the customer the behavior of
isLetter() changes post Mantis as follows:

tmarble@fred 38% pwd
/home/tmarble/javaperf/2004/TLR/Mantis
tmarble@fred 39% /usr/java/j2sdk1.4.2_07/bin/javac UnicodeTest.java
tmarble@fred 40% /usr/java/j2sdk1.4.2_07/bin/java UnicodeTest
is letter: false
is digit: false
tmarble@fred 41% cd ../Tiger
/home/tmarble/javaperf/2004/TLR/Tiger
tmarble@fred 42% /usr/java/jdk1.5.0_02/bin/javac UnicodeTest.java 
tmarble@fred 43% /usr/java/jdk1.5.0_02/bin/java UnicodeTest
is letter: true
is digit: false
tmarble@fred 44% cd ../Mustang
/home/tmarble/javaperf/2004/TLR/Mustang
tmarble@fred 45% /usr/java/jdk1.6.0/bin/javac UnicodeTest.java
tmarble@fred 46% /usr/java/jdk1.6.0/bin/java UnicodeTest
is letter: true
is digit: false
tmarble@fred 47% 

This is a bug because the behavior changed.
As the character in question is a modifier character I'm not
sure what the "right" behavior is, but I suspect that this may
relate to correct interpretation of unicode.  FFI see:

   I found the relavent Unicode map here:
     http://www.unicode.org/charts/U02B0.pdf
   There is further discussion of that specific character here:
     http://www.tachyonsoft.com/uc0002.htm#U02C6
     http://www.fileformat.info/info/unicode/char/02c6/index.htm
   There is also a discussion of Unicode version 4:
     http://www.unicode.org/versions/Unicode4.0.1/

Please note that according to bug 5034599 Unicode 4.0.1
will be delayed until Mustang.  HOWEVER it is not clear that
this is a Unicode 4 issue.

And correct behavior for this one unicode character may not
indicate correctness of the universe of possibile letters
(correctness of isLetter() must be reviewed in the general case).

--Tom
###@###.### 10/27/04 20:07 GMT
###@###.### 10/27/04 22:30 GMT


The test source code as provided by ###@###.###:

public class UnicodeTest {

 public static void main (String argv[]) {
   char myUnicodeCharacter = (char) Integer.parseInt("2C6", 16);

   System.out.println("is letter: " +
Character.isLetter(myUnicodeCharacter));
   System.out.println("is digit: " +
Character.isDigit(myUnicodeCharacter));
 }
}
###@###.### 10/28/04 17:40 GMT

Comments
EVALUATION Can you please provide the source code of the test case? ###@###.### 10/27/04 22:30 GMT This is not a bug. The Character class follows the Unicode standard in most of its functionality, including the isLetter methods. As the class description says, it's based on Unicode 3.0 for J2SE 1.4, on Unicode 4.0 for J2SE 5. The Unicode standard changed the classification of U+02C6 from category "Sk" (modifier symbol) to category "Lm" (modifier letter). ###@###.### 10/28/04 17:40 GMT
27-10-2004