JDK-7071819 : To support Extended Grapheme Clusters in Regex
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Affected Version: 7
  • Priority: P4
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2011-07-27
  • Updated: 2019-04-25
  • Resolved: 2016-02-13
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9 b106Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
2.2 Extended Grapheme Clusters

One or more Unicode characters may make up what the user thinks of as a character. To avoid ambiguity with the computer use of the term character, this is called a grapheme cluster. For example, "G" + acute-accent is a grapheme cluster: it is thought of as a single character by users, yet is actually represented by two Unicode characters. The Unicode Standard defines extended grapheme clusters that keep Hangul syllables together and do not break between base characters and combining marks. The precise definition is in UTR #29: Text Boundaries [UAX29]. These extended grapheme clusters are not the same as tailored grapheme clusters, which are covered in Level 3, Tailored Grapheme Clusters.

Comments
http://hg.openjdk.java.net/jdk9/dev/jdk/rev/cf8aea3bb87a
13-02-2016