JDK-6635133 : Exception thrown when using a Unicode escape
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Affected Version: 6
  • Priority: P4
  • Status: Closed
  • Resolution: Fixed
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2007-11-28
  • Updated: 2011-05-18
  • Resolved: 2011-05-18
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7
7 b27Fixed
Description
FULL PRODUCT VERSION :
java version "1.6.0_02"
Java(TM) SE Runtime Environment (build 1.6.0_02-b06)
Java HotSpot(TM) Client VM (build 1.6.0_02-b06, mixed mode, sharing)

ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows XP [Version 5.1.2600]

A DESCRIPTION OF THE PROBLEM :
The following examples is OK.
System.out.println("\uD834\uDD22".matches("[\\uD834\\uDD21-\uD834\uDD24]+"));

but, the following examples, PatternSyntaxException is generated.
System.out.println("\uD834\uDD21".matches("[\\uD834\\uDD21-\\uD834\\uDD22]+"));


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Compile and run the following.


import java.util.regex.Pattern;
public class Sample {
    // U+1D121 is \uD834\uDD21
    public static void main(String[] args) {
        System.out.println("\uD834\uDD22".matches("[\\uD834\\uDD21-\uD834\uDD24]+"));
        System.out.println("\uD834\uDD21".matches("[\\uD834\\uDD21-\uD834\uDD22]+"));
    }
}


EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
true
true
ACTUAL -
true
Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal character range near index 19
[\uD834\uDD21-\uD834\uDD22]+
                   ^
        at java.util.regex.Pattern.error(Unknown Source)
        at java.util.regex.Pattern.range(Unknown Source)
        at java.util.regex.Pattern.clazz(Unknown Source)
        at java.util.regex.Pattern.sequence(Unknown Source)
        at java.util.regex.Pattern.expr(Unknown Source)
        at java.util.regex.Pattern.compile(Unknown Source)
        at java.util.regex.Pattern.<init>(Unknown Source)
        at java.util.regex.Pattern.compile(Unknown Source)
        at java.util.regex.Pattern.matches(Unknown Source)
        at java.lang.String.matches(Unknown Source)
        at Sample.main(Sample.java:10)


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.util.regex.Pattern;
public class Sample {
    // U+1D121 is \uD834\uDD21
    public static void main(String[] args) {
        System.out.println("\uD834\uDD22".matches("[\\uD834\\uDD21-\uD834\uDD24]+"));
        System.out.println("\uD834\uDD21".matches("[\\uD834\\uDD21-\uD834\uDD22]+"));
    }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
The upper example shows.

Comments
EVALUATION The utility method Pattern.u() does not understand surrogates.
29-11-2007