JDK-8191411 : Unicode 10.0.0 support
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.lang
  • Priority: P3
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 11
  • Submitted: 2017-11-16
  • Updated: 2018-03-15
  • Resolved: 2018-02-07
Related Reports
CSR :  
Description
Summary
-------

To support Unicode 10.0.0 newly added blocks and scripts need to be reflected in Character class. Thus 18 new blocks, 10 new scripts will be added in Character.UnicodeBlock and Character.UnicodeScript classes respectively.

Problem
-------

To support Unicode 10.0.0, new scripts and new blocks need to be added to Character.UnicodeBlock and Character.UnicodeScript classes.


Solution
--------

18 new blocks and 10 new scripts will be added in Character.UnicodeBlock and Character.UnicodeScript classes which are part of API doc in Character class. Following changes need to be made :

 -  API doc change : Replace "8.0.0" with "10.0.0" in the java.lang.Character class API doc.
 -  Add 18 fields to java.lang.Character.UnicodeBlock.
 -  Add 10 enum elements to java.lang.Character.UnicodeScript.

Specification
-------------

Following changes will be made in API doc of java.lang.Character Class.

    1.  A change in class description of java.lang.Character class:
     
    -    Character information is based on the Unicode Standard, version 8.0.0.  
    +   Character information is based on the Unicode Standard, version 10.0.0.
    
     

    2. 18 fields which are added to java.lang.Character.UnicodeBlock:

            
    +        /**
    +         * Constant for the "Syriac Supplement" Unicode
    +         * character block.
    +         * @since 11
    +         */
    +        public static final UnicodeBlock SYRIAC_SUPPLEMENT
   
    +        /**
    +         * Constant for the "Cyrillic Extended-C" Unicode
    +         * character block.
    +         * @since 11
    +         */
    +        public static final UnicodeBlock CYRILLIC_EXTENDED_C
    
    +        /**
    +         * Constant for the "Osage" Unicode
    +         * character block.
    +         * @since 11
    +         */
    +        public static final UnicodeBlock OSAGE 
   
             
    +        /**
    +         * Constant for the "Newa" Unicode
    +         * character block.
    +         * @since 11
    +         */
    +        public static final UnicodeBlock NEWA 
   
                
    +        /**
    +         * Constant for the "Mongolian Supplement" Unicode
    +         * character block.
    +         * @since 11
    +         */
    +        public static final UnicodeBlock MONGOLIAN_SUPPLEMENT 
    
               
    +        /**
    +         * Constant for the "Marchen" Unicode
    +         * character block.
    +         * @since 11
    +         */
    +        public static final UnicodeBlock MARCHEN 
   
               
    +        /**
    +         * Constant for the "Ideographic Symbols and Punctuation" Unicode
    +         * character block.
    +         * @since 11
    +         */
    +        public static final UnicodeBlock IDEOGRAPHIC_SYMBOLS_AND_PUNCTUATION 
    
             
    +        /**
    +         * Constant for the "Tangut" Unicode
    +         * character block.
    +         * @since 11
    +         */
    +        public static final UnicodeBlock TANGUT 
    
               
    +        /**
    +         * Constant for the "Tangut Components" Unicode
    +         * character block.
    +         * @since 11
    +         */
    +        public static final UnicodeBlock TANGUT_COMPONENTS 
  
            
    +        /**
    +         * Constant for the "Kana Extended-A" Unicode
    +         * character block.
    +         * @since 11
    +         */
    +        public static final UnicodeBlock KANA_EXTENDED_A 
   
    +        /**
    +         * Constant for the "Glagolitic Supplement" Unicode
    +         * character block.
    +         * @since 11
    +         */
    +        public static final UnicodeBlock GLAGOLITIC_SUPPLEMENT 
   
    +        /**
    +         * Constant for the "Adlam" Unicode
    +         * character block.
    +         * @since 11
    +         */
    +        public static final UnicodeBlock ADLAM 
   
           
    +        /**
    +         * Constant for the "Masaram Gondi" Unicode
    +         * character block.
    +         * @since 11
    +         */
    +        public static final UnicodeBlock MASARAM_GONDI 
   
            
    +        /**
    +         * Constant for the "Zanabazar Square" Unicode
    +         * character block.
    +         * @since 11
    +         */
    +        public static final UnicodeBlock ZANABAZAR_SQUARE 
  
           
    +        /**
    +         * Constant for the "Nushu" Unicode
    +         * character block.
    +         * @since 11
    +         */
    +        public static final UnicodeBlock NUSHU 
  
           
    +        /**
    +         * Constant for the "Soyombo" Unicode
    +         * character block.
    +         * @since 11
    +         */
    +        public static final UnicodeBlock SOYOMBO 
   
            
    +        /**
    +         * Constant for the "Bhaiksuki" Unicode
    +         * character block.
    +         * @since 11
    +         */
    +        public static final UnicodeBlock BHAIKSUKI 
  
            
    +        /**
    +         * Constant for the "CJK Unified Ideographs Extension F" Unicode
    +         * character block.
    +         * @since 11
    +         */
    +        public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS_EXTENSION_F 
   
     
        3. 10 new scripts which will be added to Character.UnicodeScript class.
    +        
    +        /**
    +          * Unicode script "Adlam".
    +          * @since 11
    +          */
    +        ADLAM,
    +        
    +        /**
    +          * Unicode script "Bhaiksuki".
    +          * @since 11
    +          */
    +        BHAIKSUKI,
    +        
    +        /**
    +          * Unicode script "Marchen".
    +          * @since 11
    +          */
    +        MARCHEN,
    +        
    +        /**
    +          * Unicode script "Newa".
    +          * @since 11
    +          */
    +        NEWA,
    +        
    +        /**
    +          * Unicode script "Osage".
    +          * @since 11
    +          */
    +        OSAGE,
    +        
    +        /**
    +          * Unicode script "Tangut".
    +          * @since 11
    +          */
    +        TANGUT,
    +        
    +        /**
    +          * Unicode script "Masaram Gondi".
    +          * @since 11
    +          */
    +        MASARAM_GONDI,
    +        
    +        /**
    +          * Unicode script "Nushu".
    +          * @since 11
    +          */
    +        NUSHU,
    +        
    +        /**
    +          * Unicode script "Soyombo".
    +          * @since 11
    +          */
    +        SOYOMBO,
    +        
    +        /**
    +          * Unicode script "Zanabazar Square".
    +          * @since 11
    +          */
    +        ZANABAZAR_SQUARE,
    +        


Comments
[~bnallakaluva].yes, API spec change is only in UnicodeBlock and UnicodeScript classes. For other classes such as NumericShaper, Bidi etc implementation has to be updated to support Unicode 10.0.0.
08-03-2018

As per this CSR my understanding is that there is only one Character class in which there is change in specification (viz. UnicodeBlock and UnicodeScript) However, the JEP JDK-8191410 mentions below additional API's besides the Character class * NumericShaper in the java.awt.font package, and * Bidi, BreakIterator, and Normalizer in the java.text package. Could you clarify if the aforementioned API's do not have any specification changes and just implementation is being changed to support Unicode 10.0 ?
07-03-2018

From a quick scan, there didn't seem to be any other existing documentation in the Character* classes that needed an explicit update to accommodate the version bump. Moving to Approved.
07-02-2018