JDK-6487160 : Pattern.UNICODE_CASE makes character class ranges case insensitive
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Affected Version: 5.0
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2006-10-27
  • Updated: 2011-02-16
  • Resolved: 2006-12-01
Related Reports
Duplicate :  
Description
FULL PRODUCT VERSION :
java version "1.5.0_09"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_09-b01)
Java HotSpot(TM) Client VM (build 1.5.0_09-b01, mixed mode, sharing)

A DESCRIPTION OF THE PROBLEM :
Enabling the Pattern.UNICODE_CASE makes character class ranges case insensitive.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Compile and run the source code provided in "Source code for an executable test case"

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Both the first and the second patterns should return identical results, which is that all the lower case letters of the string dogFace should be removed and the uppercase F remain in the result.
ACTUAL -
The first pattern returns incorrect results in that the upper case F is removed. The second returns F.

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.util.regex.*;

public class regextest
{
	public static void main(String args[])
	{
		Pattern pattern = Pattern.compile("[a-z]", Pattern.UNICODE_CASE);
		System.err.println(pattern.matcher("dogFace").replaceAll(""));

		pattern = Pattern.compile("[abcdefghijklmnopqrstuvwxyz]", Pattern.UNICODE_CASE);
		System.err.println(pattern.matcher("dogFace").replaceAll(""));
	}
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Don't use ranges in combination with UNICODE_CASE.

Comments
EVALUATION Here is a test case showing the current unsatisfactory situation: import java.util.regex.*; public class Bug { static void test(String lower, String regexTemplate, Object... args) { String regex = String.format(regexTemplate, args); System.out.printf("\\u%04x %7s => %5b %5b %5b%n", (int)lower.charAt(0), regexTemplate, lower.matches("(?i)" + regex), lower.matches("(?u)" + regex), lower.matches("(?iu)" + regex)); } static void test0(String lower, String description) { System.out.printf("%-17s %5s %5s %5s%n", description, "(?i)", "(?u)", "(?iu)"); String upper = lower.toUpperCase(); test(lower, "%s", upper); test(lower, "[%s]", upper); test(lower, "[%s-%s]", upper, upper); System.out.println(); } public static void main(String[] args) throws Throwable { test0("\u0065", "ASCII"); test0("\u00e9", "Latin-1"); test0("\u0435", "Cyrillic"); } } ==> javac -source 1.6 -Xlint:all Bug.java ==> java -esa -ea Bug ASCII (?i) (?u) (?iu) \u0065 %s => true true true \u0065 [%s] => true false true \u0065 [%s-%s] => true true true Latin-1 (?i) (?u) (?iu) \u00e9 %s => false true true \u00e9 [%s] => true false true \u00e9 [%s-%s] => true true true Cyrillic (?i) (?u) (?iu) \u0435 %s => false true true \u0435 [%s] => false true true \u0435 [%s-%s] => true true true
27-10-2006