JDK-4804273 : Letter combination AA not sorted properly in Swedish Locale
  • Type: Bug
  • Component: globalization
  • Sub-Component: translation
  • Affected Version: 1.3.0,1.4.1
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux,windows_98,windows_2000
  • CPU: x86
  • Submitted: 2003-01-16
  • Updated: 2004-03-25
  • Resolved: 2004-03-25
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
5.0 beta2Fixed
Related Reports
Duplicate :  
Description

Name: rmT116609			Date: 01/16/2003


FULL PRODUCT VERSION :
java version "1.4.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-b92)
Java HotSpot(TM) Client VM (build 1.4.0-b92, mixed mode)

and:

java version "1.4.1_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_01-b01)
Java HotSpot(TM) Client VM (build 1.4.1_01-b01, mixed mode)

FULL OPERATING SYSTEM VERSION :
Windows 2000 Version 5.0 (Build 2195: Service Pack 3)


A DESCRIPTION OF THE PROBLEM :
When using the Swedish Locale, the letter combination AA is not sorted correctly. It should be sorted without any special rules, as it is in the Finnish Locale.

(I believe that this bug is inspired by the Danish/Norwegian method of using AA as a substitute for A_WITH_RING. They didn't introduce A_WITH_RING into their
alphabets until the 20:th century. Swedes/Finns, however, have never used AA instead of A_WITH_RING).


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Compile and run the supplied program.
2. Examine the output.



EXPECTED VERSUS ACTUAL BEHAVIOR :
Notice that "aardvark" will be sorted last, instead of
first. This is certainly not correct for Swedish.

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.text.*;
import java.util.*;

/**********************************************************
* This program demonstrates that a special sorting rule is
* applied to the letter combination AA, when using the
* Swedish Locale ("sv", "SE").
***********************************************************/
public class CollatorTest {

  /********************************************************
  *********************************************************/
  public static void main (String[] args) {
    Locale loc = new Locale ("sv", "SE");   // Swedish

    Locale.setDefault (loc);
    Collator col = Collator.getInstance ();

    String[] data = {"aardvark",
                     "antilope",
                     "baboon",
                     "crocodile"};
    Arrays.sort (data, col);

    System.out.println ("Using " + loc.getDisplayName());
    for (int i = 0;  i < data.length;  i++) {
      System.out.println (data[i]);
    }//end for
  }//end main

}//end class CollatorTest

---------- END SOURCE ----------

CUSTOMER WORKAROUND :
When sorting, in the Swedish Locale, use a Finnish Collator

public Collator getCollator () {
  if (Locale.getDefault().getLanguage().equals("sv")) {
    return Collator.getInstance(new Locale("fi", "FI"));
  }//end if
  return Collator.getInstance();
}//end getCollator



(Review ID: 179153) 
======================================================================

Name: rl43681			Date: 03/05/2003


FULL PRODUCT VERSION :
On the machine with 2.2.19 kernel:
java version "1.4.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-b92)
Java HotSpot(TM) Client VM (build 1.4.0-b92, mixed mode)
On the 2.4.20 kernel machine:
java version "1.4.0_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0_01-b03)
Java HotSpot(TM) Client VM (build 1.4.0_01-b03, mixed mode)


FULL OS VERSION :
Tested on these platforms:
Linux l82 2.4.20 #1 s��n dec 8 00:17:15 CET 2002 i686 Pentium II (Deschutes) GenuineIntel GNU/Linux
Linux xxxx.unit.liu.se 2.2.19-6.2.15smp #1 SMP Wed Feb 27 10:44:30 EST 2002 i686 unknown


A DESCRIPTION OF THE PROBLEM :
When asking java to sort text for the Swedish locale the tokens 'aa', 'aA', 'Aa' and 'AA' are collated together with '��' (A-ring) witch is not correct.

Also '��' ae ligature, the Norwegian/Danish letter representing the same sound as '��' a-umlaut is not colleted correctly.

The problem seems to be CollationElements in LocaleElements_sv.java.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
See source code, I didn't bother to use the \u notation except for the rules that I hope will go back into LocaleElements_sv.java, hopefully that will not be a problem.

EXPECTED VERSUS ACTUAL BEHAVIOR :
I expected the collator to get it right with the default locale as well, not just the one I patched.
default
a
A
ae
Ae
b
B
y
Y
��
��
z
Z
��
��
��
��
aa
Aa
��
��
��
��
��
��

patched
a
A
aa
Aa
ae
Ae
b
B
y
Y
��
��
z
Z
��
��
��
��
��
��
��
��
��
��


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.text.*;
import java.util.*;

/**
 * A test of Javas opinion on the Swedish alphabet.
 */
public class CollationBugDemo {

    String name;

    /** This should be a Collator that knows how to order AA, ��, ��, ��
     * & c. for Swedish. */
    Collator swedishCollator;

    /** This will be a set of sorted strings eventually. */
    SortedSet resultSet;

    public CollationBugDemo(String name, Collator collator) {
        this.name = name;
        resultSet = new TreeSet();
        swedishCollator = collator;
    }

    public void add(String element) {
        resultSet.add(swedishCollator.getCollationKey(element));
    }

    public void printElements() {
        System.out.println(name);
        Iterator elements = resultSet.iterator();
        while (elements.hasNext()) {
            String element =
                (String)((CollationKey)elements.next()).getSourceString();
            System.out.println(element);
        }
        System.out.println();
    }

    public static void exerciseDemo(CollationBugDemo demo) {
        demo.add("A");
        demo.add("Aa");
        demo.add("Ae");
        demo.add("B");
        demo.add("Y");
        demo.add("��"); // U-umlaut
        demo.add("Z");
        demo.add("��"); // A-ring
        demo.add("��"); // A-umlaut
        demo.add("��"); // AE ligature
        demo.add("��"); // O-umlaut
        demo.add("��"); // O-stroke
        demo.add("a");
        demo.add("aa");
        demo.add("ae");
        demo.add("b");
        demo.add("y");
        demo.add("��"); // u-umlaut
        demo.add("z");
        demo.add("��"); // a-ring
        demo.add("��"); // a-umlaut
        demo.add("��"); // ae ligature
        demo.add("��"); // o-umlaut
        demo.add("��"); // o-stroke

        demo.printElements();
    }

    public static void main(String[] args) throws Exception {
        CollationBugDemo demo;

        Locale swedishLocale = new Locale("sv", "SE");
        Collator defaultCollator = Collator.getInstance(swedishLocale);
        demo = new CollationBugDemo("default", defaultCollator);
        exerciseDemo(demo);

        String defaultRules =
            ((RuleBasedCollator)defaultCollator).getRules();
        int beginningOfSpecificRules =
            defaultRules.indexOf("& Z");
        String genericCollationRules =
            defaultRules.substring(0, beginningOfSpecificRules);

        String patchedRules =
            // (I'm a bit torn between either "tricking" people into
            // using the double-acute variants witch are not used in
            // Swedish or making someone nearly as upset about the
            // collating as I got to submit this.)
            genericCollationRules +
            "< a\u030a , A\u030a " +                     // a-ring
            "< a\u0308 , A\u0308 " +                     // a-umlaut
            // Someone writing a-double-acute, o-double-acute or
            // u-double-acute is most likely to expect them collated
            // along with the respective umlauts.
            "; a\u030b , A\u030b " +                     // a-double-acute
            // The same applies to the ae ligature witch is the
            // Norwegian and Danish representation of the same sound
            // as a-umlaut in Swedish.
            "; \u00e6 , \u00c6 " +                       // ae ligature
            "< o\u0308 , O\u0308 " +                     // o-umlaut
            "; o\u030b , O\u030b " +                     // o-double-acute
            // And o-stroke is the Norwegian and Danish representation
            // of the same sound as o-umlaut in Swedish.
            "; \u00f8 , \u00d8 " +                       // o-stroke
            "& V ; w , W" +
            "& Y, u\u0308 , U\u0308" +                   // u-umlaut
            "; u\u030b , U\u030b ";                      // u-double-acute

        Collator patchedCollator = new RuleBasedCollator(
                patchedRules);

        demo = new CollationBugDemo("patched", patchedCollator);
        exerciseDemo(demo);
    }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Se source code.
(Review ID: 182172)
======================================================================

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: tiger-beta2 FIXED IN: tiger-beta2 INTEGRATED IN: tiger-beta2
14-06-2004

EVALUATION j2se/src/share/classes/sun/text/resources/LocaleElements_sv.java file was provided by Taligent. The last section of this file is CollationElements. ###@###.### 2003-03-06 ###@###.### 2003-03-07 I can plug this fix into the files easily enough. I still need to write the test case for the fix though, so that it can be automatically checked by I18n team. ###@###.### 2003-10-17 ******** l10n evaluation template - begin *********** Evaluation : Updating the accented chars to collate differently to AA, ae etc. sccsdiff info (e.g. sccs diffs -r1.30 1.31 Activator_fr.java): sccsdiff -r1.20 -r1.18 SCCS/s.LocaleElements_sv.java ------- LocaleElements_sv.java ------- 477,480c477,480 < "& Z < a\u030a , A\u030a" + // a-ring, aa ligaure < "< a\u0308 , A\u0308 < a\u030b, A\u030b " + // a-umlaut, a-double-acute < "< \u00e6 , \u00c6 " + // ae ligature < "< o\u0308 , O\u0308 " + // o-umlaut --- > "& Z < \u00e6 , \u00c6 " + // Z < ae ligature > "< a\u030a , A\u030a , aa , aA , Aa , AA" + // a-ring, aa ligaure > "< a\u0308 , A\u0308 < o\u0308 , O\u0308 " + // a-umlaut < o-umlaut > "; u\u030b , U\u030b " + // u-double-acute483,484c483 < "& Y, u\u0308 , U\u0308" + // u-double-acute < "; u\u030b, U\u030b " --- > "& Y, u\u0308 , U\u0308" List file(s) to be delivered : src/share/classes/sun/text/resources/LocaleElements_sv.java Target Build : Tiger-beta Additional Info : ******** l10n evaluation template - end*********** The following files have been integrated into tiger-beta2 b42 on 3/9 src/share/classes/sun/text/resources/LocaleElements_sv.java test/java/text/Collator/Bug4804273.java ###@###.### 2004-03-24
24-03-2004