United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-6208680 : Doc: Clarify issues with toLowerCase/toUpperCase and Turkish

Details
Type:
Bug
Submit Date:
2004-12-15
Status:
Resolved
Updated Date:
2016-12-21
Project Name:
JDK
Resolved Date:
2005-09-19
Component:
core-libs
OS:
generic
Sub-Component:
java.lang
CPU:
generic
Priority:
P3
Resolution:
Fixed
Affected Versions:
5.0
Fixed Versions:

Related Reports
Relates:
Relates:
Relates:
Relates:
Relates:
Relates:
Relates:
Relates:
Relates:

Sub Tasks

Description
Software that uses String.toLowerCase or String.toUpperCase sometimes fails to work when run in a Turkish or Azeri environment, if the case conversion operates on strings that aren't actually in these languages (e.g., they're HTML tags, encoding names, programming language keywords, or similar). The reason is that Turkish and Azeri have dotted and dotless "i"s, and conversion of these characters leads to results that aren't adequate for strings in other languages. A common solution for this issue is to specify an English locale for the methods.

The issue should be clearly documented in the javadoc for these methods, and the JRE checked for correct usage.
###@###.### 2004-12-15 01:43:39 GMT
Good synopsis of issue can be found at : 

http://java.sys-con.com/node/46241

                                    

Comments
PUBLIC COMMENTS

I would recommend that String.toLowerCase() and String.toUpperCase() be marked @Deprecated, based on the number of times I have seen otherwise stable software break very badly when run by Turkish users in their home locale. (For example, old versions of Ant would fail when running tasks with an 'i' in the name.) Most Java developers (even some Turkish-speaking ones!) are completely unaware of this pitfall.

In my experience, most usages of these methods should really be converted to pass Locale.ENGLISH, since they are usually used on codewords of some kind or another. For the occasional cases where you really intended a locale-sensitive conversion of a natural-language string, passing Locale.getDefault() (or indeed some other Locale object) makes this intent explicit and warns readers of the code that the result could vary depending on the environment.
                                     
2009-04-17
EVALUATION

The "easier part" will be fixed in Mustang as it is described.
"Much harder part" will be treated as a different defect. We've started investigating actual problems in JRE and will fix one by one whenever a problem is found.

Therefore, this bug is treated as a doc bug since now.
                                     
2005-08-27
EVALUATION

The easy part are the documentation changes. In String.toLowerCase, add:

     * <p>
     * <b>Note:</b> This method is locale sensitive, and will produce wrong results
     * if used for strings that are intended to be interpreted locale independently.
     * Examples are programming language identifiers, protocol keys, and HTML tags.
     * For instance, <code>"TITLE".toLowerCase()</code> in a Turkish locale returns
     * <code>"t\u0131tle"</code>, where "\u0131" is the DOTLESS I character.
     * To obtain correct results for locale insensitive strings, use
     * <code>toLowerCase(Locale.ENGLISH)</code>.

In String.toUpperCase, add:

     * <p>
     * <b>Note:</b> This method is locale sensitive, and will produce wrong results
     * if used for strings that are intended to be interpreted locale independently.
     * Examples are programming language identifiers, protocol keys, and HTML tags.
     * For instance, <code>"title".toUpperCase()</code> in a Turkish locale returns
     * <code>"T\u0130TLE"</code>, where "\u0130" is the I WITH DOT ABOVE character.
     * To obtain correct results for locale insensitive strings, use
     * <code>toUpperCase(Locale.ENGLISH)</code>.

Much harder is the work of updating the JDK to use toLowerCase/toUpperCase correctly. A quick search found over 700 uses of toLowerCase and over 500 uses of toUpperCase. Some of them are calls to the Character methods, which are locale insensitive, others already specify a locale, but many still need to be checked for correct use. I'd recommend that calls to the parameterless methods be eliminated entirely, that is, Locale.getDefault() or Locale.ENGLISH should be used explicitly.
                                     
2005-07-27



Hardware and Software, Engineered to Work Together