JDK-8253059 : Case insensitive collators for supplementary characters
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.text
  • Priority: P4
  • Status: Resolved
  • Resolution: Won't Fix
  • OS: generic
  • CPU: generic
  • Submitted: 2020-09-11
  • Updated: 2020-09-14
  • Resolved: 2020-09-14
Related Reports
Relates :  
Description
Raised in the jdk-dev ml:
https://mail.openjdk.java.net/pipermail/jdk-dev/2020-September/004727.html

---
For scripts Deseret, Osage, Old Hungarian, Warang Citi,
Medefaidrin, and Adlam, for strings with upper- and
lowercase variants of the same letter, the following
code fails:

Collator collator = Collator.getInstance();
collator.setStrength(Collator.PRIMARY);
assertThat(collator.compare(lower, upper)).isEqualTo(0);
Comments
According to Collator class' spec: --- The exact assignment of strengths to language features is locale dependent --- So it does not necessarily mean specifying PRIMARY would distinguish cases. Looking at the implementation in sun.util.locale.provider.CollationRules class, only the latin alphabets (with combining marks) are supposed to have TERTIARY differences. For example, Russian "A" (U+0410) and Russian "a" (U+0430) would not be considered TERTIARY different. (It IS considered TERTIARY equal with "ru" locale Collator instance, though). Thus supplementary characters' case insensitivity is not a TERTIARY difference in the default collation rules, i.e., working as expected. I am not sure this default behavior is intended, but I would not replace it with different ones, because:- - It would cause incompatibility. - If one would need it, he can implement java.text.spi.CollatorProvider interface. So unless there is a dire need to change the default behavior, I would not fix this as suggested.
14-09-2020