Bug ID: JDK-4706545 Provide (or document) regex character classes for Java character classes

Type: Enhancement
Component: core-libs
Sub-Component: java.util.regex
Affected Version: 5.0

Priority: P4
Status: Resolved
Resolution: Fixed
OS: generic
CPU: generic

Submitted: 2002-06-21
Updated: 2017-05-16
Resolved: 2003-07-07

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

Other
5.0 tigerFixed

The regular expression api has a number of convenient pre-defined character classes; e.g. \p{Lower} for lowercase ASCII, \p{InGreek} for Greek letters, etc.  However, for some classes there are differences between the Unicode/regex notion of the class and the Java notion of the class.  For example, the JLS notion of white space is *not* the same as the \p{Space} set since the JLS does not include vertical tab (\v a.k.a. \x0B).  Additionally, the Character class has many methods to help indentify certain classes of characters, including 3 methods with different definitions of whitespace.  It would be useful if there were documented regex character classes for each of the is* methods in Character.  Beyond documenting corresponding regular expression, new character classes for sets defined in Character could be defined.

Having regular expressions for the character sets in Character would ease writing regular expression to precisely recognize Java constructs.

CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: tiger FIXED IN: tiger INTEGRATED IN: tiger tiger-b10

14-06-2004

EVALUATION Sounds like a reasonable suggestion. ###@###.### 2002-06-24 Character classes have been added to match the isXXX methods in Character that are not deprecated. ###@###.### 2003-06-05

24-06-2002