JDK-8307565 : Support variant collations
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.text
  • Priority: P4
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 21
  • Submitted: 2023-05-05
  • Updated: 2023-05-16
  • Resolved: 2023-05-16
Related Reports
CSR :  
Description
Summary
-------

Support multiple collations

Problem
-------

`java.text.Collator` instances are created based on the locale, and there is no way to utilize multiple collations for the locale. A recent issue [JDK-8306927](https://bugs.openjdk.org/browse/JDK-8306927) modified the collation for the Swedish language to a modern replacement, however, it would be desirable to provide a means for users to use the traditional collation.

Solution
--------

Utilize the [Unicode collation identifier](http://www.unicode.org/reports/tr35/#UnicodeCollationIdentifier) to specify the type of collation for the locale. For example, if the user wants to create a `Collator` with the old collation, it can be created with the locale `sv-u-co-trad`.

As to the implementation, Swedish is the only locale that utilizes this mechanism as of this enhancement. Its default sorting should stay with the modern one modified by the JDK-8306927 fix, which aligns with CLDR that also recently switched the default (aka "standard", [CLDR-15603](https://unicode-org.atlassian.net/browse/CLDR-15603)).

Specification
-------------

Add the following description in the method description for `java.text.Collator.getInstance(Locale)`:

         /**
          * Gets the Collator for the desired locale.
    +     * @apiNote Implementations of {@code Collator} class may produce
    +     * different instances based on the "{@code co}"
    +     * <a href="https://www.unicode.org/reports/tr35/#UnicodeCollationIdentifier">
    +     * Unicode collation identifier</a> in the {@code desiredLocale}.
    +     * For example:
    +     * {@snippet lang = java:
    +     * Collator.getInstance(Locale.forLanguageTag("sv-u-co-trad"));
    +     * }
    +     * may return a {@code Collator} instance with the traditional sorting, which
    +     * gives 'v' and 'w' the same sorting order, while the {@code Collator} instance
    +     * for the Swedish locale without "co" identifier distinguishes 'v' and 'w'.
    +     * @spec https://www.unicode.org/reports/tr35/ Unicode Locale Data Markup Language
    +     *     (LDML)
          * @param desiredLocale the desired locale.
          * @return the Collator for the desired locale.

Additionally, modify the class description (removing <blockquote>, changing "Note: " to `@apiNote`), 

    @@ -71,7 +70,6 @@
      * <p>
      * The following example shows how to compare two strings using
      * the {@code Collator} for the default locale.
    - * <blockquote>
      * {@snippet lang=java :
      * // Compare two strings in the default locale
      * Collator myCollator = Collator.getInstance();
    @@ -81,7 +79,6 @@ import sun.util.locale.provider.LocaleServiceProviderPool;
      *     System.out.println("abc is greater than or equal to ABC");
      * }
      * }
    - * </blockquote>
      *
      * <p>
      * You can set a {@code Collator}'s <em>strength</em> property
    @@ -94,7 +91,6 @@
      * "e" and "E" are tertiary differences and "e" and "e" are identical.
      * The following shows how both case and accents could be ignored for
      * US English.
    - * <blockquote>
      * {@snippet lang=java :
      * // Get the Collator for US English and set its strength to PRIMARY
      * Collator usCollator = Collator.getInstance(Locale.US);
    @@ -103,7 +99,6 @@
      *     System.out.println("Strings are equivalent");
      * }
      * }
    - * </blockquote>
      * <p>
      * For comparing {@code String}s exactly once, the {@code compare}
      * method provides the best performance. When sorting a list of
    @@ -114,7 +109,7 @@
      * against other {@code CollationKey}s. A {@code CollationKey} is
      * created by a {@code Collator} object for a given {@code String}.
      * <br>
    - * <strong>Note:</strong> {@code CollationKey}s from different
    + * @apiNote {@code CollationKey}s from different
      * {@code Collator}s can not be compared. See the class description
      * for {@link CollationKey}
      * for an example using {@code CollationKey}s.

and no-arg `getInstance` method description (making Locale.getDefault() as a link) as follows:

         /**
          * Gets the Collator for the current default locale.
    -     * The default locale is determined by java.util.Locale.getDefault.
    +     * The default locale is determined by {@link Locale#getDefault()}.
          * @return the Collator for the default locale.(for example, en_US)
          * @see java.util.Locale#getDefault
          */


Comments
Moving amended request back to Approved.
16-05-2023

Added the apiNote update. Sorry, I wasn't aware of your comment wrt approval. Made it back to `finalized`.
16-05-2023

Add a code review comment to augment the apiNote somewhat. Moving this CSR to Approved; approval stands with or without the apiNote update.
15-05-2023

The RN for JDK-8306927 already announces that the Swedish collation rules have been modified, that RN could potentially be updated with an example that uses `Collator.getInstance(Locale.forLanguageTag("sv-u-co-trad"))`. JDK-8308018 has already been created to update the supports locales doc.
14-05-2023