JDK-8264671 : Update Pattern spec to provide details of character class syntax and behavior
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Affected Version: 9
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2021-04-03
  • Updated: 2024-03-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Duplicate :  
Relates :  
Description
In JDK 9, JDK-6609854 made a significant change to the behavior of negation and nesting of character classes. This change was not documented anywhere, because some of the more obscure behaviors of character classes are not documented at all. These need to be specified.

This message from Xueming Shen (who implemented the earlier change) describes the change, some rationale, and most importantly for this bug, a some hints at what should be in the specification of these features of character classes:

http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-June/006957.html

The various operators whose behaviors need to be specified in combination are:

(1) Negation  ^        (only at the beginning of the [...])
(2) Intersection &&
(3) Range -
(4) nested class []
(5) Union                (empty string, that is, two elements placed adjacent)

Xueming's email has a statement about the precedence of these operators, but I don't think it's correct. Of course, the eventual specification should be correct.

This is mostly about specifying the existing behaviors of regex character classes, after the JDK-6609854 change. I don't expect there to be any code or behavior changes as a result of this specification update. However, some bugs might be flushed out by closer analysis, and some additional tests might be warranted. That work could be handled by separate bugs.

The JDK-6609854 change had a retroactive CSR request filed for it: JDK-8275184. This has a bunch of details that will probably be useful in writing the specification updates.