JDK-8292992 : Release Note: Grapheme support in BreakIterator
  • Type: Sub-task
  • Component: core-libs
  • Sub-Component: java.text
  • Priority: P4
  • Status: Resolved
  • Resolution: Delivered
  • OS: generic
  • CPU: generic
  • Submitted: 2022-08-26
  • Updated: 2022-09-09
  • Resolved: 2022-09-09
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 20
20Resolved
Description
Character boundary analysis in `java.text.BreakIterator` now conforms to Extended Grapheme Clusters breaks defined in <a href="https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries">Unicode Consortium's  Standard Annex #29</a>. This change will introduce intentional behavioral changes because the old implementation simply breaks at the code point boundaries for the vast majority of characters. For example, this is a String that contains the US flag and a grapheme for a 4-member-family.
```
"πŸ‡ΊπŸ‡ΈπŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦"
```
This String will be broken into two graphemes with the new implementation:
```
"πŸ‡ΊπŸ‡Έ", "πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦"
```
whereas the old implementation simply breaks at the code point boundaries:
```
"πŸ‡Ί", "πŸ‡Έ", "πŸ‘¨", "(zwj)", "πŸ‘©", "(zwj)", "πŸ‘§", "(zwj)"‍, "πŸ‘¦" 
```
where (zwj) denotes ZERO WIDTH JOINER (U+200D).
Comments
Thanks! I hope this won't end up as another `I ? Unicode` πŸ™‚
26-08-2022

Looks good. Let's see if the release note generator handles the emoji properly. :-)
26-08-2022