United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6635133 Exception thrown when using a Unicode escape
JDK-6635133 : Exception thrown when using a Unicode escape

Details
Type:
Bug
Submit Date:
2007-11-28
Status:
Closed
Updated Date:
2011-05-18
Project Name:
JDK
Resolved Date:
2011-05-18
Component:
core-libs
OS:
windows_xp
Sub-Component:
java.util.regex
CPU:
x86
Priority:
P4
Resolution:
Fixed
Affected Versions:
6
Fixed Versions:

Related Reports

Sub Tasks

Description
FULL PRODUCT VERSION :
java version "1.6.0_02"
Java(TM) SE Runtime Environment (build 1.6.0_02-b06)
Java HotSpot(TM) Client VM (build 1.6.0_02-b06, mixed mode, sharing)

ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows XP [Version 5.1.2600]

A DESCRIPTION OF THE PROBLEM :
The following examples is OK.
System.out.println("\uD834\uDD22".matches("[\\uD834\\uDD21-\uD834\uDD24]+"));

but, the following examples, PatternSyntaxException is generated.
System.out.println("\uD834\uDD21".matches("[\\uD834\\uDD21-\\uD834\\uDD22]+"));


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Compile and run the following.


import java.util.regex.Pattern;
public class Sample {
    // U+1D121 is \uD834\uDD21
    public static void main(String[] args) {
        System.out.println("\uD834\uDD22".matches("[\\uD834\\uDD21-\uD834\uDD24]+"));
        System.out.println("\uD834\uDD21".matches("[\\uD834\\uDD21-\uD834\uDD22]+"));
    }
}


EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
true
true
ACTUAL -
true
Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal character range near index 19
[\uD834\uDD21-\uD834\uDD22]+
                   ^
        at java.util.regex.Pattern.error(Unknown Source)
        at java.util.regex.Pattern.range(Unknown Source)
        at java.util.regex.Pattern.clazz(Unknown Source)
        at java.util.regex.Pattern.sequence(Unknown Source)
        at java.util.regex.Pattern.expr(Unknown Source)
        at java.util.regex.Pattern.compile(Unknown Source)
        at java.util.regex.Pattern.<init>(Unknown Source)
        at java.util.regex.Pattern.compile(Unknown Source)
        at java.util.regex.Pattern.matches(Unknown Source)
        at java.lang.String.matches(Unknown Source)
        at Sample.main(Sample.java:10)


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.util.regex.Pattern;
public class Sample {
    // U+1D121 is \uD834\uDD21
    public static void main(String[] args) {
        System.out.println("\uD834\uDD22".matches("[\\uD834\\uDD21-\uD834\uDD24]+"));
        System.out.println("\uD834\uDD21".matches("[\\uD834\\uDD21-\uD834\uDD22]+"));
    }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
The upper example shows.

                                    

Comments
EVALUATION

The utility method Pattern.u() does not understand surrogates.
                                     
2007-11-29



Hardware and Software, Engineered to Work Together