Blocks :
|
|
Duplicate :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
The change JDK-8020037 "String.toLowerCase incorrectly increases length, if string contains \u0130 char" seems to be wrong, according to my reading of the Unicode standard. The text "String.toLowerCase incorrectly increases length" makes the assumption that this is a problem, but of course it isn't: The documentation specifically says "Since case mappings are not always 1:1 char mappings, the resulting String may be a different length than the original String." I look at http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt and see: # Preserve canonical equivalence for I with dot. Turkic is handled below. 0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH DOT ABOVE My understanding of this is that in all locales *except* the ones handled specially (which are 'az', 'lt', and 'tr') we should bi-directionally convert "\u0130" <-> "\u0069\u0307". I.e. lowercasing "\u0130" should result in "\u0069\u0307"; converting "\u0069\u0307" to uppercase or titlecase should yield "\u0130". Note this allows round-trip conversions, which is why it is specified this way. Java 7 correctly does the former conversion, but not the latter. Java 8 does neither.
|