JDK-8214245 : Case insensitive matching doesn't work correctly for some character classes
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Affected Version: 11,12
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2018-11-21
  • Updated: 2020-03-24
  • Resolved: 2020-03-18
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 15
15 b15Fixed
Related Reports
CSR :  
Sub Tasks
JDK-8239887 :  
Description
ADDITIONAL SYSTEM INFORMATION :
$ java -version
openjdk version "11.0.1" 2018-10-16
OpenJDK Runtime Environment 18.9 (build 11.0.1+13)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.1+13, mixed mode)

A DESCRIPTION OF THE PROBLEM :
When using the CASE_INSENSITIVE flag, the matching behavior of the POSIX character classes and a literal character class with the same set differs.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
See test program.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The pattern "[a-z]" should behave the same as "\\p{Lower}" which in the docs it says is US-ASCII only and the same as "[a-z]". 
ACTUAL -
When running with the CASE_INSENSITIVE flag, "[a-z]" will match an uppercase letter, but "\\p{Lower}" will not. 

---------- BEGIN SOURCE ----------
// $ javac Test.java
// $ java -ea Test
// Exception in thread "main" java.lang.AssertionError
//      at Test.main(Test.java:8) 
import java.util.regex.Pattern;

public class Test {
  public static void main(String[] args) {
    Pattern p1 = Pattern.compile("[a-z]", Pattern.CASE_INSENSITIVE);
    Pattern p2 = Pattern.compile("\\p{Lower}", Pattern.CASE_INSENSITIVE);
    assert(p1.matcher("A").find() == p2.matcher("A").find());
  }
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Avoid using POSIX character classes.

FREQUENCY : always



Comments
URL: https://hg.openjdk.java.net/jdk/jdk/rev/5df90c29762d User: igerasim Date: 2020-03-18 08:04:55 +0000
18-03-2020

Here is what POSIX [1] states: """ 9.2 Regular Expression General Requirements ... When a standard utility or function that uses regular expressions specifies that pattern matching shall be performed without regard to the case (uppercase or lowercase) of either data or patterns, then when each character in the string is matched against the pattern, not only the character, but also its case counterpart (if any), shall be matched. """ [1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html So it appears true that in the case-insensitive mode, not only the character itself has to be matched against the character-class, but also its upper/lower-case counterparts. Using \p{Lower} in the case-insensitive mode may seem strange, but it is still eligible.
28-11-2018

To reproduce the issue, run the attached test case. JDK 11.0.1 - Fail JDk 12-ea+21 - fail Output: Exception in thread "main" java.lang.AssertionError at JI9058202.main(JI9058202.java:8)
23-11-2018