Bug ID: JDK-6185419 Unicode behavior change in Character.isLetter() post mantis

Type: Bug
Component: core-libs
Sub-Component: java.lang
Affected Version: 5.0,6

Priority: P3
Status: Closed
Resolution: Not an Issue
OS: solaris_9
CPU: generic,sparc

Submitted: 2004-10-27
Updated: 2010-04-02
Resolved: 2004-10-28

A major Sun customer, 
cannot move to Tiger because of a change in behavior in Character.isLetter()

Using the code snippet provided by the customer the behavior of
isLetter() changes post Mantis as follows:

tmarble@fred 38% pwd
/home/tmarble/javaperf/2004/TLR/Mantis
tmarble@fred 39% /usr/java/j2sdk1.4.2_07/bin/javac UnicodeTest.java
tmarble@fred 40% /usr/java/j2sdk1.4.2_07/bin/java UnicodeTest
is letter: false
is digit: false
tmarble@fred 41% cd ../Tiger
/home/tmarble/javaperf/2004/TLR/Tiger
tmarble@fred 42% /usr/java/jdk1.5.0_02/bin/javac UnicodeTest.java 
tmarble@fred 43% /usr/java/jdk1.5.0_02/bin/java UnicodeTest
is letter: true
is digit: false
tmarble@fred 44% cd ../Mustang
/home/tmarble/javaperf/2004/TLR/Mustang
tmarble@fred 45% /usr/java/jdk1.6.0/bin/javac UnicodeTest.java
tmarble@fred 46% /usr/java/jdk1.6.0/bin/java UnicodeTest
is letter: true
is digit: false
tmarble@fred 47% 

This is a bug because the behavior changed.
As the character in question is a modifier character I'm not
sure what the "right" behavior is, but I suspect that this may
relate to correct interpretation of unicode.  FFI see:

   I found the relavent Unicode map here:
     http://www.unicode.org/charts/U02B0.pdf
   There is further discussion of that specific character here:
     http://www.tachyonsoft.com/uc0002.htm#U02C6
     http://www.fileformat.info/info/unicode/char/02c6/index.htm
   There is also a discussion of Unicode version 4:
     http://www.unicode.org/versions/Unicode4.0.1/

Please note that according to bug 5034599 Unicode 4.0.1
will be delayed until Mustang.  HOWEVER it is not clear that
this is a Unicode 4 issue.

And correct behavior for this one unicode character may not
indicate correctness of the universe of possibile letters
(correctness of isLetter() must be reviewed in the general case).

--Tom
###@###.### 10/27/04 20:07 GMT
###@###.### 10/27/04 22:30 GMT


The test source code as provided by ###@###.###:

public class UnicodeTest {

 public static void main (String argv[]) {
   char myUnicodeCharacter = (char) Integer.parseInt("2C6", 16);

   System.out.println("is letter: " +
Character.isLetter(myUnicodeCharacter));
   System.out.println("is digit: " +
Character.isDigit(myUnicodeCharacter));
 }
}
###@###.### 10/28/04 17:40 GMT

EVALUATION Can you please provide the source code of the test case? ###@###.### 10/27/04 22:30 GMT This is not a bug. The Character class follows the Unicode standard in most of its functionality, including the isLetter methods. As the class description says, it's based on Unicode 3.0 for J2SE 1.4, on Unicode 4.0 for J2SE 5. The Unicode standard changed the classification of U+02C6 from category "Sk" (modifier symbol) to category "Lm" (modifier letter). ###@###.### 10/28/04 17:40 GMT

27-10-2004

Duplicate :	JDK-6212048 - REGRESSION: Difference between java 1.4/5.0 unicode letter & digit recognition
Relates :	JDK-5034599 - RFE: Upgrade to Unicode 4.1