JDK-4688586 : java.util.regex.Pattern doesn't know "Latin Extended-B" Unicode block
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Affected Version: 1.4.0
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • OS: linux
  • CPU: x86
  • Submitted: 2002-05-21
  • Updated: 2003-09-02
  • Resolved: 2003-09-02
Related Reports
Duplicate :  
Description

Name: rmT116609			Date: 05/21/2002


FULL PRODUCT VERSION :
java version "1.4.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-b92)
Java HotSpot(TM) Client VM (build 1.4.0-b92, mixed mode)

FULL OPERATING SYSTEM VERSION :
Linux 2.4.17, SuSE 7.3, Windows 2000, Solaris 2.8

A DESCRIPTION OF THE PROBLEM :
Pattern.compile("[\\p{InLatinExtended-B}]*");
results in:
java.util.regex.PatternSyntaxException: Unknown character family {LatinExtended-B} near index 21

it's of course workaroundable with:
Pattern.compile("[\u0180-\u024F]*");

but it's a bug! :)


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1) Compile the test case(test.java)
2) Run it.

ERROR MESSAGES/STACK TRACES THAT OCCUR :
Exception in thread "main" java.util.regex.PatternSyntaxException: Unknown character family {LatinExtended-B} near index 21
[\p{InLatinExtended-B}]*
                     ^
        at java.util.regex.Pattern.error(Pattern.java:1472)
        at java.util.regex.Pattern.familyError(Pattern.java:2137)
        at java.util.regex.Pattern.retrieveFamilyNode(Pattern.java:2114)
        at java.util.regex.Pattern.family(Pattern.java:2096)
        at java.util.regex.Pattern.range(Pattern.java:2024)
        at java.util.regex.Pattern.clazz(Pattern.java:1991)
        at java.util.regex.Pattern.sequence(Pattern.java:1529)
        at java.util.regex.Pattern.expr(Pattern.java:1489)
        at java.util.regex.Pattern.compile(Pattern.java:1257)
        at java.util.regex.Pattern.<init>(Pattern.java:1013)
        at java.util.regex.Pattern.compile(Pattern.java:760)
        at test.main(test.java:38)



This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.util.regex.*;
public class test {
	public static void main(String args[]) throws Throwable {
		Pattern.compile("[\\p{InLatinExtended-B}]*");
	}
}


---------- END SOURCE ----------

CUSTOMER WORKAROUND :
Pattern.compile("[\u0180-\u024F]*");
(Review ID: 146813) 
======================================================================

Comments
EVALUATION There are some newer Unicode blocks that are currently unsupported. We may extend coverage of these blocks in a future release. ###@###.### 2002-05-21 In Tiger we will be relying on the Character for the unicode blocks and scripts which will fix this bug. ###@###.### 2003-05-12 As expected, this bug has been fixed by the fix for 4898036. I am closing this bug as a duplicate of it. ###@###.### 2003-09-02
12-05-2003