JDK-8262279 : Regex intersection character class applies to whole enclosing character class
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 8,11,15,17
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • Submitted: 2021-02-20
  • Updated: 2024-03-12
  • Resolved: 2024-03-12
Related Reports
Duplicate :  
Description
A DESCRIPTION OF THE PROBLEM :
The documentation of java.util.regex.Pattern suggests that a nested intersection character class applies to an "operand":
> The intersection operator denotes a class that contains every character that is in both of its operand classes. 

However, it appears that is not actually true; instead the intersection applies to the whole enclosing character class instead of the "operand" immediately in front of it.

For example pattern "[a-z&&[b-e]A-Z]" (respectively "[A-Za-z&&[b-e]]") should match:
- A-Z
- OR a-z INTERSECTING b-e

Therefore for example 'A' should be allowed. However, because the intersection appears to apply to the enclosing character class as a whole, none of the characters defined by `A-Z` are allowed (because the intersection does not cover them):
```
for (char c = 'A'; c <= 'z'; c++) {
    System.out.println(Character.toString(c) + ": " + Character.toString(c).matches("[A-Za-z&&[b-e]]"));
}
```

If that is actually the intended behavior, then it should be made more clear that "operand" is the enclosing character class. Because the current documentation (and examples) make it look like it only applies to the immediately preceding characters.

Possibly related to JDK-8037397



Comments
Closed as duplicate.
11-05-2021

This bug is a perspective of a vague part of the documentation and is superseded by JDK-8264671. It's a bug because the behavior simply is not well-defined. There are other behaviors relating to these operators and character classes that are not described by the spec at all. JDK-8264671 will clarify the shortcomings related to this bug. Closing.
08-04-2021

After further discussion we're reasonably convinced this is unrelated to JDK-6609854.
01-04-2021

Possibly related to JDK-6609854. Reassigning to [~igraves] for evaluation.
30-03-2021

The observations on Windows 10: JDK 8: Failed, only matches `b-e` JDK 11: Failed. JDK 15: Failed. JDK 17: Failed.
24-02-2021

Additional information from the submitter: The `for` loop is starting with `A` and ending with `z`. The output shows that it only matches `b-e` (if I recall correctly), but I would have expected that it additionally also matches any char between `A-Z` because that range is not immediately in front of the intersection and should therefore not be affected by it.
24-02-2021

Requested the reproducer including the test cases and expected results from the submitter.
22-02-2021