JDK-8020037 : String.toLowerCase incorrectly increases length, if string contains \u0130 char
  • Type: Bug
  • Status: Closed
  • Resolution: Fixed
  • Component: core-libs
  • Sub-Component: java.lang
  • Priority: P3
  • Affected Version: 7
  • OS: generic
  • CPU: generic
  • Submit Date: 2013-07-06
  • Updated Date: 2014-04-25
  • Resolved Date: 2013-10-21
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availabitlity Release.

To download the current JDK release, click here.
JDK 7 JDK 8
7u60Fixed 8 b115Fixed
Related Reports
Relates :  
Relates :  
Description
FULL PRODUCT VERSION :
java version  " 1.7.0_25 " 
Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)


ADDITIONAL OS VERSION INFORMATION :
3.8.0-25-generic #37-Ubuntu SMP Thu Jun 6 20:47:07 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

A DESCRIPTION OF THE PROBLEM :
The problem does not happen when the test is run in Turkish locale.
In order to reproduce the problem, the locale should be set to English (or probably any non-Turkish locale)

In English locale, if a string with dotted-capital-I (Turkish-I, \u0130) character is converted to lower case, using toLoweCase method, an extra (and invalid) character is added to the resulting string just after the Turkish-I character.


REGRESSION.  Last worked in version 6u45

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
       String stringWithDottedI =  " \u0130 " ;
        Locale.setDefault(new Locale( " en " ,  " US " ));

        String lowerCasedString = stringWithDottedI.toLowerCase();

        assertEquals(stringWithDottedI.length(), lowerCasedString.length());

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
lowerCasedString.length() == 1
ACTUAL -
lowerCasedString.length() == 2

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
package test;

import org.junit.Before;
import org.junit.Test;

import java.util.Locale;

import static org.junit.Assert.assertEquals;

public class StringTest {
    private final String stringWithDottedI =  " \u0130 " ;

    @Before
    public void setup() {
    }

    @Test
    public void testWhenLocaleIsTurkish_lowerCasedStringShouldHaveSameLength() {
        Locale.setDefault(new Locale( " tr " ,  " TR " ));

        String lowerCasedString = stringWithDottedI.toLowerCase();

        assertEquals(stringWithDottedI.length(), lowerCasedString.length());
    }

    @Test
    public void testWhenLocaleIsEnglish_lowerCasedStringShouldHaveSameLength() {
        Locale.setDefault(new Locale( " en " ,  " US " ));

        String lowerCasedString = stringWithDottedI.toLowerCase();

        assertEquals(stringWithDottedI.length(), lowerCasedString.length());
    }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
set default locale to Turkish ( new Locale( " tr " , " TR " ) )
or call  toLowerCase method which accepts a locale parameter and pass a Turkish locale parameter
Comments
I believe this change is incorrect, and is based on the invalid assumption that "increasing length" is a problem. I created JDK-8041791. (Note Java 7 isn't correct either, but it isn't quite so wrong as Java 8 in this respect.)
2014-04-25

A fix version field can never contain more than 1 value. I'm removing 7-pool and will create a backport record.
2013-10-29

The issue should have been introduced with JDK-6404304 so it's not an immediate regression starting from 7u25. It's a regression from 6.
2013-07-12

Naoto, Yuka - is this issue in 7? It seems that it's not a regression from 7u25.
2013-07-12

The lowercasing for that character was modified for the fix to 6404304 (RFE: Unicode 5.1 support).
2013-07-10

Naoto, would you please help take a look? Close it if you agree it's not a bug.
2013-07-09