JDK-6212048 : REGRESSION: Difference between java 1.4/5.0 unicode letter & digit recognition
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 5.0
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • OS: solaris_9
  • CPU: sparc
  • Submitted: 2004-12-23
  • Updated: 2010-04-02
  • Resolved: 2004-12-23
Related Reports
Duplicate :  
Description
FULL PRODUCT VERSION :
java version "1.5.0_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_01-b08)
Java HotSpot(TM) Client VM (build 1.5.0_01-b08, mixed mode, sharing)
&
java version "1.4.2_06"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_06-b03)
Java HotSpot(TM) Client VM (build 1.4.2_06-b03, mixed mode)

ADDITIONAL OS VERSION INFORMATION :
SunOS butte 5.8 Generic_108528-18 sun4u sparc SUNW,Ultra-Enterprise
Microsoft Windows XP [Version 5.1.2600]

A DESCRIPTION OF THE PROBLEM :
The 1.5 JVM returns true on the Character.isLetter() and Character.isDigit() calls for some unicode characters where under the 1.4 JVM the Character.isLetter and Character.isDigit calls return false.


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Can be reproduced by running the test program below against the unidcode file found at: http://www.unicode.org/Public/UNIDATA/UnicodeData.txt

Run the code under a 1.4 JVM and then under a 1.5 JVM.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
I would expect that the Character.isLetter and Character.isDigit calls would return the same values no matter what the JVM version was.
ACTUAL -
Below is the actual result/differences between the two JVM's.

Here are a list of unicode characters that return different values on the Character.isLetter() API call.  (below the isLetter differences are the isDigit differences)

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
The following code will create a file of all the unidcode characters with there Character.isLetter value.  This needs to be compiled and run under 1.4 and then under 1.5.  You can generate the Character.isDigit values by switching the Character.isLetter() call with the Character.isDigit() call.  You also need to get the UnicodeData.txt file from the website describe above and place it on a local file system.

Note, that you will also have to change the file names that are being generated or you will overwrite what was previously done.

public class ParseFile {
	
	public static void main(String[] args) {
		
		try {
			BufferedReader br = new BufferedReader(new FileReader(new File("c:\\UnicodeData.txt")));
			BufferedWriter bw = new BufferedWriter(new FileWriter(new File("c:\\UnicodeLetter.14.txt")));
			
			String input = br.readLine();
			while (input != null) {
				int index = input.indexOf(';');
				String tmp = input.substring(0,index);
				char myUnicodeCharacter = (char)Integer.parseInt(tmp,16);
				bw.write(tmp + "-" + Character.isLetter(myUnicodeCharacter) + "\n");
				input = br.readLine();
			}
			br.close();
			bw.close();
			
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

The following code will then generate a file that contains the differences between the 1.4 and 1.5 JVM's

public class FindDiff {
	
	public static void main(String[] args) {
		try {
			BufferedReader br14 = new BufferedReader(new FileReader(new File("c:\\UnicodeLetter.14.txt")));
			BufferedReader br15 = new BufferedReader(new FileReader(new File("c:\\UnicodeLetter.15.txt")));
			BufferedWriter bw = new BufferedWriter(new FileWriter(new File("c:\\UnicodeLetterDiff.txt")));
			
			String input14 = br14.readLine();
			String input15 = br15.readLine();
			
			while (input14 != null) {
				
				if (!input14.equals(input15)) {
					bw.write("(1.4):" + input14 + " vs. (1.5):" + input15 + "\n");
				}
				
				input14 = br14.readLine();
				input15 = br15.readLine();
			}
			br14.close();
			br15.close();
			bw.close();
			
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

}
---------- END SOURCE ----------

Release Regression From : 1.4.2
The above release value was the last known release where this 
bug was known to work. Since then there has been a regression.

Release Regression From : 5.0
The above release value was the last known release where this 
bug was known to work. Since then there has been a regression.
###@###.### 2004-12-23 02:16:38 GMT