Bug ID: JDK-4425387 Error in result of String.compareToIgnoreCase

Type: Bug
Component: core-libs
Sub-Component: java.lang
Affected Version: 1.4.0

Priority: P4
Status: Closed
Resolution: Fixed
OS: windows_2000
CPU: x86

Submitted: 2001-03-14
Updated: 2002-06-12
Resolved: 2002-04-10

Other
1.4.1 hopperFixed

Consider the following program:

public class Test {
    public static void main(String[] args){
        String s1 = "\u00df";
        String s2 = "SS";
        System.out.println(s1.toUpperCase().toLowerCase().compareTo(
            s2.toUpperCase().toLowerCase()));
        System.out.println(s1.compareToIgnoreCase(s2));
    }
}

When running it, the following is reported:

J:\>java Test
0
108

I.e. two different results are reported, while the API documentation
states they should be the same.
This is due to the change applied in 1.4 to String.toUpperCase,
which translates a German sharp-s into SS, which is correct, but
invalidates the definition of compareToIgnoreCase().
Note that this is not an issue of compareToIgnoreCase() delivering
satisfactory results or not, but of compying to its specification.

CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: hopper FIXED IN: hopper INTEGRATED IN: hopper VERIFIED IN: hopper-beta

14-06-2004

EVALUATION Not yet clear that this is a valid bug. The current spec clearly states that this does not work for some locales. Since it uses the CHaracter API, I wouldn't expect it to behave correctly for characters with 1:M mappings for uppercase. john.oconner@Eng 2001-03-20 Either the behavior of this method or its javadoc (specification) should change. As it is, the javadoc clearly describes a particular behavior and the implementation does not conform to that description. The javadoc says: This method returns an integer whose sign is that of this.toUpperCase().toLowerCase().compareTo( str.toUpperCase().toLowerCase()). But that is not currently true (no matter what the locale). The proper solution is probably to perform case-folding on both strings (as described in UTR#21 and CaseFolding.txt) and then compare them as with compareTo(String). But that would require changing the javadoc. And that seems to be equivalent to the behavior described by the javadoc (at least, at the moment). So changing the code to follow the specification is probably the best solution. steve.hanna@East 2001-04-16 Name: nl37777 Date: 06/18/2001 This bug isn't really new - the specification of compareToIgnoreCase has been inconsistent with itself and the implementation since its original release in 1.2. The test case in the description doesn't fail with releases before 1.4 beta, but that's only because of bug 4219630, which prevented correct uppercasing of strings starting with "\u00df". If the string s1 in the test case is changed to "s\u00df" and s2 to "SSS", then the test case fails in all releases that have compareToIgnoreCase. The problems with compareToIgnoreCase are: - The specification states that the "method does not take locale into account", but it also specifies the behavior by reference to String.toUpperCase and String.toLowerCase, both of which do take the default locale into account. - The specification refers to String.toUpperCase and String.toLowerCase to describe the behavior, but the implemention neither uses these methods nor mimics their documented behavior with respect to "\u00df" and other special cases. The latter problem is more severe for J2SE 1.4 than for earlier releases, because J2SE 1.4 bases its character handling on Unicode 3.0. Unicode 3.0 defines 102 1:m case mappings instead of the 1 in prior Unicode versions. String.toUpperCase and String.toLowerCase have already been upgraded for this under bug 4304573. Any fix for this bug should - keep compareToIgnoreCase in sync with equalsIgnoreCase and regionMatches(boolean, ...) - keep these methods locale independent (as mentioned in a prior evaluation, java.text.Collator exists to handle locale sensitive collation). There are two possible fixes: - specify compareToIgnoreCase using references to Character.toUpperCase and Character.toLowerCase instead of references to String.toUpperCase and String.toLowerCase, or - upgrade both specification and implementation of compareToIgnoreCase, equalsIgnoreCase, and regionMatches(boolean, ...) to use correct uppercasing according to Unicode 3.0, similar to what was done for String.toUpperCase and String.toLowerCase, but not use any locale dependent behavior (such as Turkish "i" mapping). ================================================================== ###@###.### 2002-04-04 I currently believe the fix should be to change the javadoc, which is now clearly incorrect and does not match the "intended" behavior of the method.

04-04-2002

WORK AROUND Use the Collator class to get locale-sensitive comparisons. john.oconner@Eng 2001-03-20

20-03-2001