JDK-8216594 : Support new Japanese era in java.lang.Character
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.lang
  • Priority: P3
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 11-pool
  • Submitted: 2019-01-13
  • Updated: 2024-01-08
  • Resolved: 2019-01-29
Related Reports
CSR :  
Duplicate :  
Relates :  
Relates :  
Description
Summary
-------

Allow Java SE 11 to use the code point for the new Japanese era.

Problem
-------

A new era in the Japanese imperial calendar will start in May 2019. [Unicode 12.0](https://www.unicode.org/versions/Unicode12.0.0/) in March 2019 is expected to assign a code point for the new era (U+32FF). Soon afterwards, [Unicode 12.1](http://blog.unicode.org/2018/09/new-japanese-era.html) is expected to define the name and properties of the character that is signified by the code point. Unfortunately, Java SE 11 uses character information from [Unicode 10.0](http://openjdk.java.net/jeps/327), so Java programs are very limited in how they can use the new era's code point.

Solution
--------

Modify the specification of `java.lang.Character` to allow (though not require) implementations of the Java SE 11 Platform to support the new era code point. In effect, the Java SE 11 Platform supports Unicode 10.0 plus an extension.

Consequently, the behavior of fields and methods of `java.lang.Character` may vary across implementations of the Java SE 11 Platform when processing U+32FF, except for the following methods that define Java identifiers:

- isJavaIdentifierStart(int)
- isJavaIdentifierStart(char)
- isJavaIdentifierPart(int)
- isJavaIdentifierPart(char)

Code points in Java identifiers must continue to be drawn from Unicode 10.0, for source compatibility reasons.
 
These changes necessitate a Maintenance Review of the Java SE 11 Platform. See the [announcement](http://mail.openjdk.java.net/pipermail/jdk-updates-dev/2018-December/000308.html) to the OpenJDK community.


Specification
-------------

The initial portion of the specification of the `java.lang.Character` class is changed from:

    /**
    * The {@code Character} class wraps a value of the primitive
    * type {@code char} in an object. An object of type
    * {@code Character} contains a single field whose type is
    * {@code char}.
    * <p>
    * In addition, this class provides several methods for determining
    * a character's category (lowercase letter, digit, etc.) and for converting
    * characters from uppercase to lowercase and vice versa.
    * <p>
    * Character information is based on the Unicode Standard, version 10.0.0.
    * <p>
    * The methods and data of class {@code Character} are defined by
    * the information in the <i>UnicodeData</i> file that is part of the
    * Unicode Character Database maintained by the Unicode
    * Consortium. This file specifies various properties including name
    * and general category for every defined Unicode code point or
    * character range.
    * <p>
    * The file and its description are available from the Unicode Consortium at:
    * <ul>
    * <li><a href="http://www.unicode.org">http://www.unicode.org</a>
    * </ul>
    *
    * <h3><a id="unicode">Unicode Character Representations</a></h3>

to:

    /**
    * The {@code Character} class wraps a value of the primitive
    * type {@code char} in an object. An object of class
    * {@code Character} contains a single field whose type is
    * {@code char}.
    * <p>
    * In addition, this class provides a large number of static methods for
    * determining a character's category (lowercase letter, digit, etc.)
    * and for converting characters from uppercase to lowercase and vice
    * versa.
    *
    * <h3><a id="conformance">Unicode Conformance</a></h3>
    * <p>
    * The fields and methods of class {@code Character} are defined in terms
    * of character information from the Unicode Standard, specifically the
    * <i>UnicodeData</i> file that is part of the Unicode Character Database.
    * This file specifies properties including name and category for every
    * assigned Unicode code point or character range. The file is available
    * from the Unicode Consortium at
    * <a href="http://www.unicode.org">http://www.unicode.org</a>.
    * <p> 
    * The Java SE 11 Platform uses character information from version 10.0
    * of the Unicode Standard, with an extension. The Java SE 11 Platform allows
    * an implementation of class {@code Character} to use the Japanese Era
    * code point, {@code U+32FF}, from the first version of the Unicode Standard
    * after 10.0 that assigns the code point. Consequently, the behavior of
    * fields and methods of class {@code Character} may vary across
    * implementations of the Java SE 11 Platform when processing the
    * aforementioned code point (outside of version 10.0), except for
    * the following methods that define Java identifiers: 
    * {@link #isJavaIdentifierStart(int)}, {@link #isJavaIdentifierStart(char)},
    * {@link #isJavaIdentifierPart(int)}, and {@link #isJavaIdentifierPart(char)}.
    * Code points in Java identifiers must be drawn from version 10.0 of
    * the Unicode Standard.
    *
    * <h3><a id="unicode">Unicode Character Representations</a></h3>

The initial portion of the specification of the `isJavaIdentifierStart(char)` method is changed from:

     /**
     * Determines if the specified character is
     * permissible as the first character in a Java identifier.
     * <p>
     * A character may start a Java identifier if and only if
     * one of the following conditions is true:
     * <ul>
     * <li> {@link #isLetter(char) isLetter(ch)} returns {@code true}
     * <li> {@link #getType(char) getType(ch)} returns {@code LETTER_NUMBER}
     * <li> {@code ch} is a currency symbol (such as {@code '$'})
     * <li> {@code ch} is a connecting punctuation character (such as {@code '_'}).
     * </ul>

to the following, which adds a statement about Unicode 10.0:

    /**
     * Determines if the specified character is
     * permissible as the first character in a Java identifier.
     * <p>
     * A character may start a Java identifier if and only if
     * one of the following conditions is true:
     * <ul>
     * <li> {@link #isLetter(char) isLetter(ch)} returns {@code true}
     * <li> {@link #getType(char) getType(ch)} returns {@code LETTER_NUMBER}
     * <li> {@code ch} is a currency symbol (such as {@code '$'})
     * <li> {@code ch} is a connecting punctuation character (such as {@code '_'}).
     * </ul>
     *
     * These conditions are tested against the character information from version
     * 10.0 of the Unicode Standard.
    
The initial portion of the specification of the `isJavaIdentifierStart(int)` method is changed from:

    /**
     * Determines if the character (Unicode code point) is
     * permissible as the first character in a Java identifier.
     * <p>
     * A character may start a Java identifier if and only if
     * one of the following conditions is true:
     * <ul>
     * <li> {@link #isLetter(int) isLetter(codePoint)}
     *      returns {@code true}
     * <li> {@link #getType(int) getType(codePoint)}
     *      returns {@code LETTER_NUMBER}
     * <li> the referenced character is a currency symbol (such as {@code '$'})
     * <li> the referenced character is a connecting punctuation character
     *      (such as {@code '_'}).
     * </ul>

to the following, which adds a statement about Unicode 10.0:

    /**
     * Determines if the character (Unicode code point) is
     * permissible as the first character in a Java identifier.
     * <p>
     * A character may start a Java identifier if and only if
     * one of the following conditions is true:
     * <ul>
     * <li> {@link #isLetter(int) isLetter(codePoint)}
     *      returns {@code true}
     * <li> {@link #getType(int) getType(codePoint)}
     *      returns {@code LETTER_NUMBER}
     * <li> the referenced character is a currency symbol (such as {@code '$'})
     * <li> the referenced character is a connecting punctuation character
     *      (such as {@code '_'}).
     * </ul>
     *
     * These conditions are tested against the character information from version
     * 10.0 of the Unicode Standard.
    
The initial portion of the specification of the `isJavaIdentifierPart(char)` method is changed from:

    /**
     * Determines if the specified character may be part of a Java
     * identifier as other than the first character.
     * <p>
     * A character may be part of a Java identifier if any of the following
     * are true:
     * <ul>
     * <li>  it is a letter
     * <li>  it is a currency symbol (such as {@code '$'})
     * <li>  it is a connecting punctuation character (such as {@code '_'})
     * <li>  it is a digit
     * <li>  it is a numeric letter (such as a Roman numeral character)
     * <li>  it is a combining mark
     * <li>  it is a non-spacing mark
     * <li> {@code isIdentifierIgnorable} returns
     * {@code true} for the character
     * </ul>

to the following, which adds a statement about Unicode 10.0:

    /**
     * Determines if the specified character may be part of a Java
     * identifier as other than the first character.
     * <p>
     * A character may be part of a Java identifier if any of the following
     * conditions are true:
     * <ul>
     * <li>  it is a letter
     * <li>  it is a currency symbol (such as {@code '$'})
     * <li>  it is a connecting punctuation character (such as {@code '_'})
     * <li>  it is a digit
     * <li>  it is a numeric letter (such as a Roman numeral character)
     * <li>  it is a combining mark
     * <li>  it is a non-spacing mark
     * <li> {@code isIdentifierIgnorable} returns
     * {@code true} for the character
     * </ul>
     *
     * These conditions are tested against the character information from version
     * 10.0 of the Unicode Standard.

The initial portion of the specification of the `isJavaIdentifierPart(int)` method is changed from:

    /**
     * Determines if the character (Unicode code point) may be part of a Java
     * identifier as other than the first character.
     * <p>
     * A character may be part of a Java identifier if any of the following
     * are true:
     * <ul>
     * <li>  it is a letter
     * <li>  it is a currency symbol (such as {@code '$'})
     * <li>  it is a connecting punctuation character (such as {@code '_'})
     * <li>  it is a digit
     * <li>  it is a numeric letter (such as a Roman numeral character)
     * <li>  it is a combining mark
     * <li>  it is a non-spacing mark
     * <li> {@link #isIdentifierIgnorable(int)
     * isIdentifierIgnorable(codePoint)} returns {@code true} for
     * the character
     * </ul>

to the following, which adds a statement about Unicode 10.0:

    /**
     * Determines if the character (Unicode code point) may be part of a Java
     * identifier as other than the first character.
     * <p>
     * A character may be part of a Java identifier if any of the following
     * conditions are true:
     * <ul>
     * <li>  it is a letter
     * <li>  it is a currency symbol (such as {@code '$'})
     * <li>  it is a connecting punctuation character (such as {@code '_'})
     * <li>  it is a digit
     * <li>  it is a numeric letter (such as a Roman numeral character)
     * <li>  it is a combining mark
     * <li>  it is a non-spacing mark
     * <li> {@link #isIdentifierIgnorable(int)
     * isIdentifierIgnorable(codePoint)} returns {@code true} for
     * the code point
     * </ul>
     *
     * These conditions are tested against the character information from version
     * 10.0 of the Unicode Standard.

The initial portion of the specification of the deprecated `isJavaLetter(char)` method is changed from:

    /**
     * Determines if the specified character is permissible as the first
     * character in a Java identifier.
     * <p>
     * A character may start a Java identifier if and only if
     * one of the following is true:
     * <ul>
     * <li> {@link #isLetter(char) isLetter(ch)} returns {@code true}
     * <li> {@link #getType(char) getType(ch)} returns {@code LETTER_NUMBER}
     * <li> {@code ch} is a currency symbol (such as {@code '$'})
     * <li> {@code ch} is a connecting punctuation character (such as {@code '_'}).
     * </ul>

to the following, which adds a statement about Unicode 10.0:

    /**
     * Determines if the specified character is permissible as the first
     * character in a Java identifier.
     * <p>
     * A character may start a Java identifier if and only if
     * one of the following conditions is true:
     * <ul>
     * <li> {@link #isLetter(char) isLetter(ch)} returns {@code true}
     * <li> {@link #getType(char) getType(ch)} returns {@code LETTER_NUMBER}
     * <li> {@code ch} is a currency symbol (such as {@code '$'})
     * <li> {@code ch} is a connecting punctuation character (such as {@code '_'}).
     * </ul>
     *
     * These conditions are tested against the character information from version
     * 10.0 of the Unicode Standard.

The initial portion of the specification of the deprecated `isJavaLetterOrDigit(char)` method is changed from:

    /**
     * Determines if the specified character may be part of a Java
     * identifier as other than the first character.
     * <p>
     * A character may be part of a Java identifier if and only if any
     * of the following are true:
     * <ul>
     * <li>  it is a letter
     * <li>  it is a currency symbol (such as {@code '$'})
     * <li>  it is a connecting punctuation character (such as {@code '_'})
     * <li>  it is a digit
     * <li>  it is a numeric letter (such as a Roman numeral character)
     * <li>  it is a combining mark
     * <li>  it is a non-spacing mark
     * <li> {@code isIdentifierIgnorable} returns
     * {@code true} for the character.
     * </ul>

to the following, which adds a statement about Unicode 10.0:

    /**
     * Determines if the specified character may be part of a Java
     * identifier as other than the first character.
     * <p>
     * A character may be part of a Java identifier if and only if one
     * of the following conditions is true:
     * <ul>
     * <li>  it is a letter
     * <li>  it is a currency symbol (such as {@code '$'})
     * <li>  it is a connecting punctuation character (such as {@code '_'})
     * <li>  it is a digit
     * <li>  it is a numeric letter (such as a Roman numeral character)
     * <li>  it is a combining mark
     * <li>  it is a non-spacing mark
     * <li> {@code isIdentifierIgnorable} returns
     * {@code true} for the character.
     * </ul>
     *
     * These conditions are tested against the character information from version
     * 10.0 of the Unicode Standard.
Comments
Moving to Approved.
29-01-2019