United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-8020037 : String.toLowerCase incorrectly increases length, if string contains \u0130 char

Details
Type:
Bug
Submit Date:
2013-07-06
Status:
Closed
Updated Date:
2014-04-25
Project Name:
JDK
Resolved Date:
2013-10-21
Component:
core-libs
OS:
generic
Sub-Component:
java.lang
CPU:
generic
Priority:
P3
Resolution:
Fixed
Affected Versions:
7
Fixed Versions:

Related Reports
Backport:
Relates:
Relates:

Sub Tasks

Description
FULL PRODUCT VERSION :
java version  " 1.7.0_25 " 
Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)


ADDITIONAL OS VERSION INFORMATION :
3.8.0-25-generic #37-Ubuntu SMP Thu Jun 6 20:47:07 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

A DESCRIPTION OF THE PROBLEM :
The problem does not happen when the test is run in Turkish locale.
In order to reproduce the problem, the locale should be set to English (or probably any non-Turkish locale)

In English locale, if a string with dotted-capital-I (Turkish-I, \u0130) character is converted to lower case, using toLoweCase method, an extra (and invalid) character is added to the resulting string just after the Turkish-I character.


REGRESSION.  Last worked in version 6u45

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
       String stringWithDottedI =  " \u0130 " ;
        Locale.setDefault(new Locale( " en " ,  " US " ));

        String lowerCasedString = stringWithDottedI.toLowerCase();

        assertEquals(stringWithDottedI.length(), lowerCasedString.length());

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
lowerCasedString.length() == 1
ACTUAL -
lowerCasedString.length() == 2

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
package test;

import org.junit.Before;
import org.junit.Test;

import java.util.Locale;

import static org.junit.Assert.assertEquals;

public class StringTest {
    private final String stringWithDottedI =  " \u0130 " ;

    @Before
    public void setup() {
    }

    @Test
    public void testWhenLocaleIsTurkish_lowerCasedStringShouldHaveSameLength() {
        Locale.setDefault(new Locale( " tr " ,  " TR " ));

        String lowerCasedString = stringWithDottedI.toLowerCase();

        assertEquals(stringWithDottedI.length(), lowerCasedString.length());
    }

    @Test
    public void testWhenLocaleIsEnglish_lowerCasedStringShouldHaveSameLength() {
        Locale.setDefault(new Locale( " en " ,  " US " ));

        String lowerCasedString = stringWithDottedI.toLowerCase();

        assertEquals(stringWithDottedI.length(), lowerCasedString.length());
    }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
set default locale to Turkish ( new Locale( " tr " , " TR " ) )
or call  toLowerCase method which accepts a locale parameter and pass a Turkish locale parameter
                                    

Comments
I believe this change is incorrect, and is based on the invalid assumption that "increasing length" is a problem.
I created JDK-8041791.  (Note Java 7 isn't correct either, but it isn't quite so wrong as Java 8 in this respect.)
                                     
2014-04-25
URL:   http://hg.openjdk.java.net/jdk8/jdk8/jdk/rev/e8683d5b2b0a
User:  lana
Date:  2013-11-05 21:09:17 +0000

                                     
2013-11-05
A fix version field can never contain more than 1 value. I'm removing 7-pool and will create a backport record.
                                     
2013-10-29
URL:   http://hg.openjdk.java.net/jdk8/tl/jdk/rev/e8683d5b2b0a
User:  peytoia
Date:  2013-10-21 21:15:55 +0000

                                     
2013-10-21
Naoto, Yuka - is this issue in 7?  It seems that it's not a regression from 7u25.
                                     
2013-07-12
The issue should have been introduced with JDK-6404304 so it's not an immediate regression starting from 7u25. It's a regression from 6.
                                     
2013-07-12
The lowercasing for that character was modified for the fix to 6404304 (RFE: Unicode 5.1 support).
                                     
2013-07-10
Naoto, would you please help take a look? Close it if you agree it's not a bug.
                                     
2013-07-09



Hardware and Software, Engineered to Work Together