JDK-8222771 : Support for Unicode 12.1
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.lang
  • Priority: P4
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 13
  • Submitted: 2019-04-19
  • Updated: 2019-06-13
  • Resolved: 2019-05-22
Related Reports
CSR :  
Relates :  
Description
Summary
-------

Support the Unicode version 12.1 in the JDK.

Problem
-------

Characters which have been assigned since Unicode 11.0 cannot be used in the JDK.

Solution
--------

Incorporate Unicode 12.1 that assigned 555 characters and 4 new scripts since Unicode 11.0. Detailed changes are described in the Unicode Consortium's websites [12.0][1] and [12.1][2]

`java.text.Bidi` and `java.text.Normalizer` classes will be upgraded to 12.0 level of Unicode Annex #9 and #15, respectively. 

Support for the Unicode extended grapheme clusters in `java.util.regex.Pattern` will be upgraded. It employs the Unicode Annex #29 "Unicode Text Segmentation" which will be upgraded from version 8.0 to 12.0.

Specification
-------------

Change the following paragraph in `java.lang.Character` class' description from: 

    * <p>
    * The Java SE 13 Platform uses character information from version 11.0
    * of the Unicode Standard, plus the Japanese Era code point,
    * {@code U+32FF}, from the first version of the Unicode Standard
    * after 11.0 that assigns the code point.

to:

    * <p>
    * Character information is based on the Unicode Standard, version 12.1.
Add the following new `java.lang.Character.UnicodeScript` enum constants:

    /**
     * Unicode script "Elymaic".
     * @since 13  
     */
    ELYMAIC,
    
    /**
     * Unicode script "Nandinagari".
     * @since 13  
     */
    NANDINAGARI,
    
    /**
     * Unicode script "Nyiakeng Puachue Hmong".
     * @since 13  
     */
    NYIAKENG_PUACHUE_HMONG,
    
    /**
     * Unicode script "Wancho".
     * @since 13  
     */
    WANCHO,

Add the following new `java.lang.Character.UnicodeBlock` fields:

    /**
     * Constant for the "Elymaic" Unicode
     * character block.
     * @since 13
     */
    public static final UnicodeBlock ELYMAIC
    
    /**
     * Constant for the "Nandinagari" Unicode
     * character block.
     * @since 13
     */
    public static final UnicodeBlock NANDINAGARI 
    
    /**
     * Constant for the "Tamil Supplement" Unicode
     * character block.               
     * @since 13
     */
    public static final UnicodeBlock TAMIL_SUPPLEMENT 
    
    /**
     * Constant for the "Egyptian Hieroglyph Format Controls" Unicode
     * character block.               
     * @since 13
     */
    public static final UnicodeBlock EGYPTIAN_HIEROGLYPH_FORMAT_CONTROLS 
    
    /**
     * Constant for the "Small Kana Extension" Unicode
     * character block.
     * @since 13
     */
    public static final UnicodeBlock SMALL_KANA_EXTENSION 
    
    /**
     * Constant for the "Nyiakeng Puachue Hmong" Unicode
     * character block.
     * @since 13
     */
    public static final UnicodeBlock NYIAKENG_PUACHUE_HMONG 
    
    /**
     * Constant for the "Wancho" Unicode
     * character block.
     * @since 13
     */
    public static final UnicodeBlock WANCHO 
    
    /**
     * Constant for the "Ottoman Siyaq Numbers" Unicode
     * character block.
     * @since 13
     */
    public static final UnicodeBlock OTTOMAN_SIYAQ_NUMBERS 
    
    /**
     * Constant for the "Symbols and Pictographs Extended-A" Unicode
     * character block.
     * @since 13
     */
    public static final UnicodeBlock SYMBOLS_AND_PICTOGRAPHS_EXTENDED_A

Change the following paragraph in `java.util.regex.Pattern` class description from:

     * <p> This class is in conformance with Level 1 of <a
     * href="http://www.unicode.org/reports/tr18/"><i>Unicode Technical
     * Standard #18: Unicode Regular Expression</i></a>, plus RL2.1
     * Canonical Equivalents.

To:

     * <p> This class is in conformance with Level 1 of <a
     * href="http://www.unicode.org/reports/tr18/"><i>Unicode Technical
     * Standard #18: Unicode Regular Expression</i></a>, plus RL2.1
     * Canonical Equivalents and RL2.2 Extended Grapheme Clusters.

  [1]: http://www.unicode.org/versions/Unicode12.0.0/
  [2]: http://www.unicode.org/versions/Unicode12.1.0/
Comments
Moving to Approved.
22-05-2019

Modified as suggested. The inclusion of the version of the JDK was suggested by [~abuckley], when we were working on backports for specific JDK versions. I agree that for this mainline javadoc, it would be better not include the JDK version here.
23-04-2019