JDK-4217441 : String.toLowerCase() doesn't handle Greek sigma
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 1.2.0,1.2.2,1.4.0,5.0
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic,windows_nt
  • CPU: generic,x86
  • Submitted: 1999-03-04
  • Updated: 2004-03-26
  • Resolved: 2003-08-19
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
5.0 tigerFixed
Related Reports
Duplicate :  
Duplicate :  
Relates :  
Description

Name: sg39081			Date: 03/04/99


 From examining the code in String.java (in the current JDK 1.2.2
build), it appears that String.toLowerCase() does not correctly
handle the Greek capital letter sigma.  The lower-case sigma has
two presentation forms: initial/medial and final, which are
represented by different Unicode code-point values.  Translation
to lower case it thus context-sensitive: If the character
following the sigma is a letter, use the initial/medial form;
otherwise, use the final form.  The current logic relies on the
Unicode character database, which will always return the
initial/medial form.

To reproduce the problem, use the following code:

public class SigmaTest {
    public static void main(String[] args) {
        String input = "\u0399\u0395\u03a3\u03a5\u03a3 \u03a7\u03a1\u0399\u03a3\u03a4\u039f\u03a3";
                // "IESUS XRISTOS"

        String output = input.toLowerCase();

        if (output.equals("\u03b9\u03b5\u03c3\u03c5\u03c2 \u03c7\u03c1\u03b9\u03c3\u03c4\u03bf\u03c2"))
            System.out.println("PASS");
        else {
            for (int i = 0; i < output.length(); i++)
                System.out.print(" " + Integer.toHexString((int)(output.charAt(i))));
            System.out.println();
        }
    }
}

This program produces the following output:

3b9 3b5 3c3 3c5 3c3 20 3c7 3c1 3b9 3c3 3c4 3bf 3c3

The 3c3 at the end of the string and the one before the space
should both be 3c2.  (The other two 3c3's should still be 3c3.)
(Review ID: 53991)
======================================================================

Name: skT88420			Date: 12/16/99


java version "1.2.2"
Classic VM (build JDK-1.2.2-W, native threads, symcjit)


Greek letters small and capital are displayed correctly,
but converting small greek letters (unicode) to upper case
results in the display of the latin version of the characters,
i.e. small pi converts to capitel P. small sigma to capital S
etc.
(Review ID: 99105)
======================================================================

Additional test case from duplicate 4519837:

public class A {
    public static void main(String[] argv) {
        String checkedString = "\u03A30";
        String ExpectedLowerString = "\u03C20";
        if(!checkedString.toLowerCase().equals(ExpectedLowerString))
            System.out.println("Incorrect lowercase");
        else 
            System.out.println("ok");
    }
}

======================================================================


Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: tiger tiger-beta FIXED IN: tiger INTEGRATED IN: tiger tiger-b16
14-06-2004

WORK AROUND Name: sg39081 Date: 03/04/99 The programmer must code this logic himself. ====================================================================== Name: skT88420 Date: 12/16/99 use 1.1.8 (Review ID: 99105) ======================================================================
11-06-2004

EVALUATION The spec for String.toLowerCase says it will use Character.toLowerCase except for the Turkish exceptions. So it behaves according to spec. michael.mccloskey@eng 2000-03-06 I agree. Will change to an RFE. john.oconner@Eng 2000-03-13 Spec has changed. john.oconner@Eng 2000-11-20 The changed specification no longer refers to Character.toLowerCase, but instead refers to "the Unicode specification's character data". However, the implementation still doesn't take the special case of sigma into account. This is therefore now a bug. ###@###.### 2001-10-26 Conditional special case mappings (including Greek Sigma lowercasing), defined in the Unicode Standard 4.0, is now implemented. ###@###.### 2003-08-06
26-10-2001