United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-4804273 Letter combination AA not sorted properly in Swedish Locale
JDK-4804273 : Letter combination AA not sorted properly in Swedish Locale

Details
Type:
Bug
Submit Date:
2003-01-16
Status:
Resolved
Updated Date:
2004-03-25
Project Name:
JDK
Resolved Date:
2004-03-25
Component:
globalization
OS:
linux,windows_98,windows_2000
Sub-Component:
translation
CPU:
x86
Priority:
P4
Resolution:
Fixed
Affected Versions:
1.3.0,1.4.1
Fixed Versions:
5.0 (beta2)

Related Reports
Duplicate:

Sub Tasks

Description

Name: rmT116609			Date: 01/16/2003


FULL PRODUCT VERSION :
java version "1.4.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-b92)
Java HotSpot(TM) Client VM (build 1.4.0-b92, mixed mode)

and:

java version "1.4.1_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_01-b01)
Java HotSpot(TM) Client VM (build 1.4.1_01-b01, mixed mode)

FULL OPERATING SYSTEM VERSION :
Windows 2000 Version 5.0 (Build 2195: Service Pack 3)


A DESCRIPTION OF THE PROBLEM :
When using the Swedish Locale, the letter combination AA is not sorted correctly. It should be sorted without any special rules, as it is in the Finnish Locale.

(I believe that this bug is inspired by the Danish/Norwegian method of using AA as a substitute for A_WITH_RING. They didn't introduce A_WITH_RING into their
alphabets until the 20:th century. Swedes/Finns, however, have never used AA instead of A_WITH_RING).


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Compile and run the supplied program.
2. Examine the output.



EXPECTED VERSUS ACTUAL BEHAVIOR :
Notice that "aardvark" will be sorted last, instead of
first. This is certainly not correct for Swedish.

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.text.*;
import java.util.*;

/**********************************************************
* This program demonstrates that a special sorting rule is
* applied to the letter combination AA, when using the
* Swedish Locale ("sv", "SE").
***********************************************************/
public class CollatorTest {

  /********************************************************
  *********************************************************/
  public static void main (String[] args) {
    Locale loc = new Locale ("sv", "SE");   // Swedish

    Locale.setDefault (loc);
    Collator col = Collator.getInstance ();

    String[] data = {"aardvark",
                     "antilope",
                     "baboon",
                     "crocodile"};
    Arrays.sort (data, col);

    System.out.println ("Using " + loc.getDisplayName());
    for (int i = 0;  i < data.length;  i++) {
      System.out.println (data[i]);
    }//end for
  }//end main

}//end class CollatorTest

---------- END SOURCE ----------

CUSTOMER WORKAROUND :
When sorting, in the Swedish Locale, use a Finnish Collator

public Collator getCollator () {
  if (Locale.getDefault().getLanguage().equals("sv")) {
    return Collator.getInstance(new Locale("fi", "FI"));
  }//end if
  return Collator.getInstance();
}//end getCollator



(Review ID: 179153) 
======================================================================

Name: rl43681			Date: 03/05/2003


FULL PRODUCT VERSION :
On the machine with 2.2.19 kernel:
java version "1.4.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-b92)
Java HotSpot(TM) Client VM (build 1.4.0-b92, mixed mode)
On the 2.4.20 kernel machine:
java version "1.4.0_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0_01-b03)
Java HotSpot(TM) Client VM (build 1.4.0_01-b03, mixed mode)


FULL OS VERSION :
Tested on these platforms:
Linux l82 2.4.20 #1 s??n dec 8 00:17:15 CET 2002 i686 Pentium II (Deschutes) GenuineIntel GNU/Linux
Linux xxxx.unit.liu.se 2.2.19-6.2.15smp #1 SMP Wed Feb 27 10:44:30 EST 2002 i686 unknown


A DESCRIPTION OF THE PROBLEM :
When asking java to sort text for the Swedish locale the tokens 'aa', 'aA', 'Aa' and 'AA' are collated together with '??' (A-ring) witch is not correct.

Also '??' ae ligature, the Norwegian/Danish letter representing the same sound as '??' a-umlaut is not colleted correctly.

The problem seems to be CollationElements in LocaleElements_sv.java.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
See source code, I didn't bother to use the \u notation except for the rules that I hope will go back into LocaleElements_sv.java, hopefully that will not be a problem.

EXPECTED VERSUS ACTUAL BEHAVIOR :
I expected the collator to get it right with the default locale as well, not just the one I patched.
default
a
A
ae
Ae
b
B
y
Y
??
??
z
Z
??
??
??
??
aa
Aa
??
??
??
??
??
??

patched
a
A
aa
Aa
ae
Ae
b
B
y
Y
??
??
z
Z
??
??
??
??
??
??
??
??
??
??


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.text.*;
import java.util.*;

/**
 * A test of Javas opinion on the Swedish alphabet.
 */
public class CollationBugDemo {

    String name;

    /** This should be a Collator that knows how to order AA, ??, ??, ??
     * & c. for Swedish. */
    Collator swedishCollator;

    /** This will be a set of sorted strings eventually. */
    SortedSet resultSet;

    public CollationBugDemo(String name, Collator collator) {
        this.name = name;
        resultSet = new TreeSet();
        swedishCollator = collator;
    }

    public void add(String element) {
        resultSet.add(swedishCollator.getCollationKey(element));
    }

    public void printElements() {
        System.out.println(name);
        Iterator elements = resultSet.iterator();
        while (elements.hasNext()) {
            String element =
                (String)((CollationKey)elements.next()).getSourceString();
            System.out.println(element);
        }
        System.out.println();
    }

    public static void exerciseDemo(CollationBugDemo demo) {
        demo.add("A");
        demo.add("Aa");
        demo.add("Ae");
        demo.add("B");
        demo.add("Y");
        demo.add("??"); // U-umlaut
        demo.add("Z");
        demo.add("??"); // A-ring
        demo.add("??"); // A-umlaut
        demo.add("??"); // AE ligature
        demo.add("??"); // O-umlaut
        demo.add("??"); // O-stroke
        demo.add("a");
        demo.add("aa");
        demo.add("ae");
        demo.add("b");
        demo.add("y");
        demo.add("??"); // u-umlaut
        demo.add("z");
        demo.add("??"); // a-ring
        demo.add("??"); // a-umlaut
        demo.add("??"); // ae ligature
        demo.add("??"); // o-umlaut
        demo.add("??"); // o-stroke

        demo.printElements();
    }

    public static void main(String[] args) throws Exception {
        CollationBugDemo demo;

        Locale swedishLocale = new Locale("sv", "SE");
        Collator defaultCollator = Collator.getInstance(swedishLocale);
        demo = new CollationBugDemo("default", defaultCollator);
        exerciseDemo(demo);

        String defaultRules =
            ((RuleBasedCollator)defaultCollator).getRules();
        int beginningOfSpecificRules =
            defaultRules.indexOf("& Z");
        String genericCollationRules =
            defaultRules.substring(0, beginningOfSpecificRules);

        String patchedRules =
            // (I'm a bit torn between either "tricking" people into
            // using the double-acute variants witch are not used in
            // Swedish or making someone nearly as upset about the
            // collating as I got to submit this.)
            genericCollationRules +
            "< a\u030a , A\u030a " +                     // a-ring
            "< a\u0308 , A\u0308 " +                     // a-umlaut
            // Someone writing a-double-acute, o-double-acute or
            // u-double-acute is most likely to expect them collated
            // along with the respective umlauts.
            "; a\u030b , A\u030b " +                     // a-double-acute
            // The same applies to the ae ligature witch is the
            // Norwegian and Danish representation of the same sound
            // as a-umlaut in Swedish.
            "; \u00e6 , \u00c6 " +                       // ae ligature
            "< o\u0308 , O\u0308 " +                     // o-umlaut
            "; o\u030b , O\u030b " +                     // o-double-acute
            // And o-stroke is the Norwegian and Danish representation
            // of the same sound as o-umlaut in Swedish.
            "; \u00f8 , \u00d8 " +                       // o-stroke
            "& V ; w , W" +
            "& Y, u\u0308 , U\u0308" +                   // u-umlaut
            "; u\u030b , U\u030b ";                      // u-double-acute

        Collator patchedCollator = new RuleBasedCollator(
                patchedRules);

        demo = new CollationBugDemo("patched", patchedCollator);
        exerciseDemo(demo);
    }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Se source code.
(Review ID: 182172)
======================================================================

                                    

Comments
EVALUATION

j2se/src/share/classes/sun/text/resources/LocaleElements_sv.java file was provided by Taligent. The last section of this file is CollationElements.
###@###.### 2003-03-06


###@###.### 2003-03-07

I can plug this fix into the files easily enough. I still need to write the test case for the fix though, so that it can be automatically checked by I18n team. 

###@###.### 2003-10-17

******** l10n evaluation template - begin ***********

Evaluation :
Updating the accented chars to collate differently to AA, ae etc.

sccsdiff info (e.g. sccs diffs -r1.30 1.31 Activator_fr.java):
sccsdiff -r1.20 -r1.18 SCCS/s.LocaleElements_sv.java 

------- LocaleElements_sv.java -------
477,480c477,480
<                 "& Z < a\u030a , A\u030a" +  // a-ring, aa ligaure
<                 "< a\u0308 , A\u0308 < a\u030b, A\u030b " +  // a-umlaut, a-double-acute
< 	        "< \u00e6 , \u00c6 " +                   //  ae ligature
< 	        "< o\u0308 , O\u0308 " +   // o-umlaut
---
>                 "& Z < \u00e6 , \u00c6 " +                   // Z < ae ligature
>                 "< a\u030a , A\u030a , aa , aA , Aa , AA" +  // a-ring, aa ligaure
>                 "< a\u0308 , A\u0308 < o\u0308 , O\u0308 " + // a-umlaut < o-umlaut
>                 "; u\u030b , U\u030b " +                     // u-double-acute483,484c483
<                 "& Y, u\u0308 , U\u0308" + // u-double-acute
< 	        "; u\u030b, U\u030b "
---
>                 "& Y, u\u0308 , U\u0308"

List file(s) to be delivered :
src/share/classes/sun/text/resources/LocaleElements_sv.java

Target Build : 
Tiger-beta

Additional Info :


******** l10n evaluation template - end***********

The following files have been integrated into tiger-beta2 b42 on 3/9
src/share/classes/sun/text/resources/LocaleElements_sv.java
test/java/text/Collator/Bug4804273.java
###@###.### 2004-03-24
                                     
2004-03-24
CONVERTED DATA

BugTraq+ Release Management Values

COMMIT TO FIX:
tiger-beta2

FIXED IN:
tiger-beta2

INTEGRATED IN:
tiger-beta2


                                     
2004-06-14



Hardware and Software, Engineered to Work Together