JDK-8279542 : Upgrade Unicode Data Files to 14.0.0
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.lang
  • Priority: P3
  • Status: Finalized
  • Resolution: Unresolved
  • Fix Versions: 19
  • Submitted: 2022-01-05
  • Updated: 2022-01-07
Related Reports
CSR :  
Description
Summary
-------

Support the Unicode Standard version 14.0.0 in the JDK.

Problem
-------

Keeping up with the latest Unicode Standard is imperative. Otherwise, interoperability with other platforms would be problematic.

Solution
--------

Incorporate Unicode 14.0 that assigned 838 characters, 12 new blocks, and 5 new scripts since Unicode 13.0. Detailed changes are described in the Unicode Consortium's 14.0 website.

`java.text.Bidi` and `java.text.Normalizer` classes will be upgraded to 14.0 level of Unicode Annex #9 and #15, respectively.

Support for the Unicode extended grapheme clusters in `java.util.regex.Pattern` will be upgraded to 14.0 level of the Unicode Annex #29 "Unicode Text Segmentation."

For more specific delta charts, refer to [Unicode.org's delta page][1]

Specification
-------------

Change the class description in the `java.lang.Character` class as:

    @@ -61,11 +61,11 @@
      * This file specifies properties including name and category for every
      * assigned Unicode code point or character range. The file is available
      * from the Unicode Consortium at
      * <a href="http://www.unicode.org">http://www.unicode.org</a>.
      * <p>
    - * Character information is based on the Unicode Standard, version 13.0.
    + * Character information is based on the Unicode Standard, version 14.0.
      * <p>
      * The Java platform has supported different versions of the Unicode
      * Standard over time. Upgrades to newer versions of the Unicode Standard
      * occurred in the following Java releases, each indicating the new version:
      * <table class="striped">
    @@ -73,10 +73,12 @@
      * <thead>
      * <tr><th scope="col">Java release</th>
      *     <th scope="col">Unicode version</th></tr>
      * </thead>
      * <tbody>
    + * <tr><th scope="row" style="text-align:left">Java SE 19</th>
    + *     <td>Unicode 14.0</td></tr>
      * <tr><th scope="row" style="text-align:left">Java SE 15</th>
      *     <td>Unicode 13.0</td></tr>
      * <tr><th scope="row" style="text-align:left">Java SE 13</th>
      *     <td>Unicode 12.1</td></tr>
      * <tr><th scope="row" style="text-align:left">Java SE 12</th>

In `java.lang.Character.UnicodeBlock` class, add the following new fields:

     /**
      * Constant for the "Arabic Extended-B" Unicode
      * character block.
      * @since 19
      */
     public static final UnicodeBlock ARABIC_EXTENDED_B

     /**
      * Constant for the "Vithkuqi" Unicode
      * character block.
      * @since 19
      */
     public static final UnicodeBlock VITHKUQI
 
     /**
      * Constant for the "Latin Extended-F" Unicode
      * character block.
      * @since 19
      */
     public static final UnicodeBlock LATIN_EXTENDED_F
 
     /**
      * Constant for the "Old Uyghur" Unicode
      * character block.
      * @since 19
      */
     public static final UnicodeBlock OLD_UYGHUR
 
     /**
      * Constant for the "Unified Canadian Aboriginal Syllabics Extended-A" Unicode
      * character block.
      * @since 19
      */
     public static final UnicodeBlock UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS_EXTENDED_A
 
     /**
      * Constant for the "Cypro-Minoan" Unicode
      * character block.
      * @since 19
      */
     public static final UnicodeBlock CYPRO_MINOAN
 
     /**
      * Constant for the "Tangsa" Unicode
      * character block.
      * @since 19
      */
     public static final UnicodeBlock TANGSA
 
     /**
      * Constant for the "Kana Extended-B" Unicode
      * character block.
      * @since 19
      */
     public static final UnicodeBlock KANA_EXTENDED_B
 
     /**
      * Constant for the "Znamenny Musical Notation" Unicode
      * character block.
      * @since 19
      */
     public static final UnicodeBlock ZNAMENNY_MUSICAL_NOTATION

     /**
      * Constant for the "Latin Extended-G" Unicode
      * character block.
      * @since 19
      */
     public static final UnicodeBlock LATIN_EXTENDED_G

     /**
      * Constant for the "Toto" Unicode
      * character block.
      * @since 19
      */
     public static final UnicodeBlock TOTO

     /**
      * Constant for the "Ethiopic Extended-B" Unicode
      * character block.
      * @since 19
      */
     public static final UnicodeBlock ETHIOPIC_EXTENDED_B

In `java.lang.Character.UnicodeScript` enum, add the following new fields:

     /**
      * Unicode script "Vithkuqi".
      * @since 19
      */
     VITHKUQI,
 
     /**
      * Unicode script "Old Uyghur".
      * @since 19
      */
     OLD_UYGHUR,
 
     /**
      * Unicode script "Cypro Minoan".
      * @since 19
      */
     CYPRO_MINOAN,
 
     /**
      * Unicode script "Tangsa".
      * @since 19
      */
     TANGSA,
 
     /**
      * Unicode script "Toto".
      * @since 19
      */
     TOTO


  [1]: https://www.unicode.org/charts/PDF/Unicode-14.0/