JDK-8267069 : Update Hebrew/Indonesian/Yiddish ISO 639 language codes to current
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.util:i18n
  • Priority: P4
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 17
  • Submitted: 2021-05-12
  • Updated: 2021-06-28
  • Resolved: 2021-05-26
Related Reports
CSR :  
Relates :  
Description
Summary
-------

Change the mapping of the obsolete ISO 639 code mapping in `Locale` class to the current code.

Problem
-------

Historically, constructors in `java.util.Locale` class map three ISO 639 language codes, namely "he", "ji", and "id" to their obsolete codes; "iw", "yi", and "in" for backward compatibility. Although this solution works well to accept both obsolete and current ISO 639 codes, constructed `Locale` object represents the obsolete language code (i.e. `Locale.getLanguage()` and `Locale.toString()` returns obsolete language codes), which looks as if the current language codes were not supported.

Solution
--------

Flip the mapping from current->obsolete to obsolete->current. For example, mapping for Hebrew changes from "he" -> "iw" to "iw"->"he". To provide the backward compatible behavior, a new system property `java.locale.useOldISOCodes` will be introduced. If the value of the system property is `true`, then the Locale class behaves in a backward-compatible manner. `java.util.ResourceBundle.Control#newBundle()` is also modified to load both obsolete and current bundle name resource if needed, honoring the requested name as a priority.

Specification
-------------

Change the part of the class description of `java.util.Locale` as follows:

       *
       * <p>During deserialization, readResolve adds extensions as described
       * in <a href="#special_cases_constructor">Special Cases</a>, only
       * for the two cases th_TH_TH and ja_JP_JP.
       *
    -  * <h4>Legacy language codes</h4>
    +  * <h4><a id="legacy_language_codes">Legacy language codes</a></h4>
       *
       * <p>Locale's constructor has always converted three language codes to
       * their earlier, obsoleted forms: {@code he} maps to {@code iw},
       * {@code yi} maps to {@code ji}, and {@code id} maps to
    -  * {@code in}.  This continues to be the case, in order to not break
    -  * backwards compatibility.
    +  * {@code in}. Since Java SE 17, this is no longer the case. Each
    +  * language maps to its new form; {@code iw} maps to {@code he}, {@code ji}
    +  * maps to {@code yi}, and {@code in} maps to {@code id}.
    +  *
    +  * <p>For the backward compatible behavior, the system property
    +  * {@systemProperty java.locale.useOldISOCodes} reverts the behavior
    +  * back to prior to Java SE 17 one. If the system property is set
    +  * to {@code true}, those three current language codes are mapped to their
    +  * backward compatible forms.
       *
       * <p>The APIs added in 1.7 map between the old and new language codes,
    -  * maintaining the old codes internal to Locale (so that
    -  * {@code getLanguage} and {@code toString} reflect the old
    -  * code), but using the new codes in the BCP 47 language tag APIs (so
    +  * maintaining the mapped codes internal to Locale (so that
    +  * {@code getLanguage} and {@code toString} reflect the mapped
    +  * code, which depends on the {@code java.locale.useOldISOCodes} system
    +  * property), but using the new codes in the BCP 47 language tag APIs (so
       * that {@code toLanguageTag} reflects the new one). This


Change the method description of each constructor in `Locale` class as follows:

          /**
           * Construct a locale from language and country.
           * This constructor normalizes the language value to lowercase and
           * the country value to uppercase.
    -      * <p>
    -      * <b>Note:</b>
    +      * @implNote
           * <ul>
    -      * <li>ISO 639 is not a stable standard; some of the language codes it defines
    -      * (specifically "iw", "ji", and "in") have changed.  This constructor accepts both the
    -      * old codes ("iw", "ji", and "in") and the new codes ("he", "yi", and "id"), but all other
    -      * API on Locale will return only the OLD codes.
    +      * <li>Obsolete ISO 639 codes ("iw", "ji", and "in") are mapped to
    +      * their current forms. See <a href="#legacy_language_codes">Legacy language
    +      * codes</a> for more information.
           * <li>For backward compatibility reasons, this constructor does not make
           * any syntactic checks on the input.
           * </ul>

Change the method description of `Locale#getLanguage()` as follows:

      
         /**
          * Returns the language code of this Locale.
          *
    -     * <p><b>Note:</b> ISO 639 is not a stable standard&mdash; some languages' codes have changed.
    -     * Locale's constructor recognizes both the new and the old codes for the languages
    -     * whose codes have changed, but this function always returns the old code.  If you
    -     * want to check for a specific language whose code has changed, don't do
    -     * <pre>
    -     * if (locale.getLanguage().equals("he")) // BAD!
    -     *    ...
    -     * </pre>
    -     * Instead, do
    -     * <pre>
    -     * if (locale.getLanguage().equals(new Locale("he").getLanguage()))
    -     *    ...
    -     * </pre>
    +     * @implNote This method returns the new forms for the obsolete ISO 639
    +     * codes ("iw", "ji", and "in"). See <a href="#legacy_language_codes">
    +     * Legacy language codes</a> for more information.
    +     *
          * @return The language code, or the empty string if none is defined.
          * @see #getDisplayLanguage
          */

Change the method description of `Locale#forLanguageTag()` as follows:

           *
           * <p>The following <b>conversions</b> are performed:<ul>
           *
           * <li>The language code "und" is mapped to language "".
           *
    -      * <li>The language codes "he", "yi", and "id" are mapped to "iw",
    -      * "ji", and "in" respectively. (This is the same canonicalization
    -      * that's done in Locale's constructors.)
    +      * <li>The language codes "iw", "ji", and "in" are mapped to "he",
    +      * "yi", and "id" respectively. (This is the same canonicalization
    +      * that's done in Locale's constructors.) See
    +      * <a href="#legacy_language_codes">Legacy language codes</a>
    +      * for more information.
           *
           * <li>The portion of a private use subtag prefixed by "lvariant",
           * if any, is removed and appended to the variant field in the
           * result locale (without case normalization).  If it is then
           * empty, the private use subtag is discarded:

Add the following list item in the method description of `java.util.ResourceBundle.Control#newBundle()` as follows:

               *
               * <li>If {@code format} is neither {@code "java.class"}
               * nor {@code "java.properties"}, an
               * {@code IllegalArgumentException} is thrown.</li>
               *
    +          * <li>If the {@code locale}'s language is one of the
    +          * <a href="./Locale.html#legacy_language_codes">Legacy language
    +          * codes</a>, either old or new, then repeat the loading process
    +          * if needed, with the bundle name with the other language.
    +          * For example, "iw" for "he" and vice versa.
               * </ul>
               *
               * @param baseName
               *        the base bundle name of the resource bundle, a fully
               *        qualified class name
Comments
I see a release note is already planned; moving to Approved.
26-05-2021