Bug ID: JDK-7071819 To support Extended Grapheme Clusters in Regex

Type: Enhancement
Component: core-libs
Sub-Component: java.util.regex
Affected Version: 7

Priority: P4
Status: Closed
Resolution: Fixed
OS: generic
CPU: generic

Submitted: 2011-07-27
Updated: 2019-04-25
Resolved: 2016-02-13

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 9
9 b106Fixed

2.2 Extended Grapheme Clusters

One or more Unicode characters may make up what the user thinks of as a character. To avoid ambiguity with the computer use of the term character, this is called a grapheme cluster. For example, "G" + acute-accent is a grapheme cluster: it is thought of as a single character by users, yet is actually represented by two Unicode characters. The Unicode Standard defines extended grapheme clusters that keep Hangul syllables together and do not break between base characters and combining marks. The precise definition is in UTR #29: Text Boundaries [UAX29]. These extended grapheme clusters are not the same as tailored grapheme clusters, which are covered in Level 3, Tailored Grapheme Clusters.

Relates :	JDK-8046101 - JEP 111: Additional Unicode Constructs for Regular Expressions
Relates :	JDK-8149787 - test/java/util/regex/GraphemeTest.java source file has non-ascii character u+00f7
Relates :	JDK-8222978 - Upgrade the extended grapheme cluster support to the latest Unicode level.