JDK-4706545 : Provide (or document) regex character classes for Java character classes
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Affected Version: 5.0
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2002-06-21
  • Updated: 2017-05-16
  • Resolved: 2003-07-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availabitlity Release.

To download the current JDK release, click here.
Other
5.0 tigerFixed
Description
The regular expression api has a number of convenient pre-defined character classes; e.g. \p{Lower} for lowercase ASCII, \p{InGreek} for Greek letters, etc.  However, for some classes there are differences between the Unicode/regex notion of the class and the Java notion of the class.  For example, the JLS notion of white space is *not* the same as the \p{Space} set since the JLS does not include vertical tab (\v a.k.a. \x0B).  Additionally, the Character class has many methods to help indentify certain classes of characters, including 3 methods with different definitions of whitespace.  It would be useful if there were documented regex character classes for each of the is* methods in Character.  Beyond documenting corresponding regular expression, new character classes for sets defined in Character could be defined.

Having regular expressions for the character sets in Character would ease writing regular expression to precisely recognize Java constructs.

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: tiger FIXED IN: tiger INTEGRATED IN: tiger tiger-b10
2004-06-14

EVALUATION Sounds like a reasonable suggestion. ###@###.### 2002-06-24 Character classes have been added to match the isXXX methods in Character that are not deprecated. ###@###.### 2003-06-05
2002-06-24