JDK-8239504 : Support for Unicode 13.0
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.lang
  • Priority: P3
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 15
  • Submitted: 2020-02-19
  • Updated: 2020-04-24
  • Resolved: 2020-04-24
Related Reports
CSR :  
Description
Summary
-------

Support the Unicode Standard version 13.0.0 in the JDK.

Problem
-------

Keeping up to the latest Unicode Standard is imperative. Otherwise, interoperability with other platforms would be problematic.

Solution
--------

Incorporate Unicode 13.0 that assigned 5,930 characters, 8 new blocks, and 4 new scripts since Unicode 12.1. Detailed changes are described in the Unicode Consortium's [13.0 website][1]. 

java.text.Bidi and java.text.Normalizer classes will be upgraded to 13.0 level of Unicode Annex #9 and #15, respectively.

Support for the Unicode extended grapheme clusters in java.util.regex.Pattern will be upgraded to 13.0 level of the Unicode Annex #29 "Unicode Text Segmentation."

Specification
-------------

Change the following statement in the ```java.lang.Character``` class description from:

    Character information is based on the Unicode Standard, version 12.1.

to:

    Character information is based on the Unicode Standard, version 13.0.

In ```java.lang.Character.UnicodeBlock``` class, add the following new fields:

    /**
     * Constant for the "Yezidi" Unicode
     * character block.
     * @since 15
     */
    public static final UnicodeBlock YEZIDI;

    /**
     * Constant for the "Chorasmian" Unicode
     * character block.
     * @since 15
     */
    public static final UnicodeBlock CHORASMIAN;

    /**
     * Constant for the "Dives Akuru" Unicode
     * character block.
     * @since 15
     */
    public static final UnicodeBlock DIVES_AKURU;

    /**
     * Constant for the "Lisu Supplement" Unicode
     * character block.
     * @since 15
     */
    public static final UnicodeBlock LISU_SUPPLEMENT;

    /**
     * Constant for the "Khitan Small Script" Unicode
     * character block.
     * @since 15
     */
    public static final UnicodeBlock KHITAN_SMALL_SCRIPT;

    /**
     * Constant for the "Tangut Supplement" Unicode
     * character block.
     * @since 15
     */
    public static final UnicodeBlock TANGUT_SUPPLEMENT;

    /**
     * Constant for the "Symbols for Legacy Computing" Unicode
     * character block.
     * @since 15
     */
    public static final UnicodeBlock SYMBOLS_FOR_LEGACY_COMPUTING;

    /**
     * Constant for the "CJK Unified Ideographs Extension G" Unicode
     * character block.
     * @since 15
     */
    public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS_EXTENSION_G;

In ```java.lang.Character.UnicodeScript``` enum, add the following new fields:

    /**
     * Unicode script "Yezidi".
     * @since 15
     */
    YEZIDI,

    /**
     * Unicode script "Chorasmian".
     * @since 15
     */
    CHORASMIAN,

    /**
     * Unicode script "Dives Akuru".
     * @since 15
     */
    DIVES_AKURU,

    /**
     * Unicode script "Khitan Small Script".
     * @since 15
     */
    KHITAN_SMALL_SCRIPT,


  [1]: http://www.unicode.org/versions/Unicode13.0.0/
Comments
Moving to Approved.
24-04-2020