JDK-8238984 : Case insensitive matching doesn't work correctly for some character classes
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Priority: P4
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 15
  • Submitted: 2020-02-12
  • Updated: 2024-04-11
  • Resolved: 2020-02-25
Related Reports
CSR :  
Relates :  
Description
Summary
-------

Named regex character classes of forms \p{name} and \P{name} have to be made aware of the case insensitive mode.

Problem
-------

In the case insensitive mode of matching against regular expression, not only a character of the input text has to be checked for inclusion into a character class, but also its lower-case, upper-case and title-case form should be checked.
With the current implementation, this holds true for single characters and character classes denoted with braces, but not for the named classes of form \p{name} or \P{name}.

In particular, this behavior goes against the POSIX standard, which states:

> **9.2 Regular Expression General Requirements** 
> ...
> When a standard utility or function that uses regular expressions specifies 
> that pattern matching shall be performed without regard to the case
> (uppercase or lowercase) of either data or patterns, then when each
> character in the string is matched against the pattern, not only the
> character, but also its case counterpart (if any), shall be matched.

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html

Solution
--------

The named character classes will be made aware of the case insensitive mode.  In particular, when in the case insensitive mode, all range classes of form [a-z] or [A-Z] should match to the same set of characters as to the class \p{Lower} or \p{Upper}.

Specification
-------------

No specification changes are necessary.

Comments
Moving to Approved; I see a release note has already been created.
25-02-2020