Bug ID: JDK-8041791 String.toLowerCase regression

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 7	JDK 8	JDK 9
7u76Fixed	8u20Fixed	9 b14Fixed

The change JDK-8020037 "String.toLowerCase incorrectly increases length, if string contains \u0130 char" seems to be wrong, according to my reading of the Unicode standard.

The text "String.toLowerCase incorrectly increases length" makes the assumption that this is a problem, but of course it isn't: The documentation specifically says "Since case mappings are not always 1:1 char mappings, the resulting String may be a different length than the original String."

I look at http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt and see:

# Preserve canonical equivalence for I with dot. Turkic is handled below.

0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH DOT ABOVE

My understanding of this is that in all locales *except* the ones handled specially (which are 'az', 'lt', and 'tr') we should bi-directionally convert "\u0130" <-> "\u0069\u0307".
I.e. lowercasing "\u0130" should result in "\u0069\u0307";
converting "\u0069\u0307" to uppercase or titlecase should yield "\u0130".

Note this allows round-trip conversions, which is why it is specified this way.

Java 7 correctly does the former conversion, but not the latter.
Java 8 does neither.

UR SQE tested the fix in 8u20. No objections to take the fix into PSU15_01
17-11-2014
Critical request: - Justification: this problem highly affects user experience in Turkish locale. - Risk Analysis: Low, the fix is pretty simple. - Webrev: http://cr.openjdk.java.net/~naoto/8041791/jdk9/webrev.00/ - Testing (done/to-be-done): Automatic regression test is included. - Back ports (done/to-be-done): Done - FX Impact: No
17-11-2014
URL: http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/6d8b6c20a32b User: lana Date: 2014-05-21 18:41:42 +0000
21-05-2014
URL: http://hg.openjdk.java.net/jdk9/dev/jdk/rev/6d8b6c20a32b User: naoto Date: 2014-05-14 17:53:19 +0000
14-05-2014
The Description is correct. Refer to the Unicode standard 6.2 Core Specification 5.18 Case Mappings pp. 173-174.
09-05-2014

Blocks :	JDK-8030201 - Nashorn: String.prototype.toLowerCase() requires SpecialCasing support
Duplicate :	JDK-8041387 - Applets not working when the preffered language is Turkish
Relates :	JDK-8049038 - In turkish locale, String.equalsIgnoreCase() returns "true" for character \u0130 and \u0131.
Relates :	JDK-8020037 - String.toLowerCase incorrectly increases length, if string contains \u0130 char
Relates :	JDK-6404304 - RFE: Unicode 5.1 support
Relates :	JDK-8043186 - javac test langtools/tools/javac/util/StringUtilsTest.java fails