United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-5047314 [Col] Collator.compare() runs indefinitely for a certain set of Thai strings
JDK-5047314 : [Col] Collator.compare() runs indefinitely for a certain set of Thai strings

Details
Type:
Bug
Submit Date:
2004-05-14
Status:
Resolved
Updated Date:
2010-07-09
Project Name:
JDK
Resolved Date:
2010-02-16
Component:
core-libs
OS:
solaris_10,windows_xp,windows
Sub-Component:
java.text
CPU:
x86,sparc
Priority:
P4
Resolution:
Fixed
Affected Versions:
1.4.2,6u16,6-pool
Fixed Versions:

Related Reports
Backport:
Backport:
Backport:

Sub Tasks

Description
Name: rmT116609			Date: 05/13/2004


FULL PRODUCT VERSION :
java version "1.4.2_04"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_04-b05)
Java HotSpot(TM) Client VM (build 1.4.2_04-b05, mixed mode)

java version "1.5.0-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-beta-b32c)
Java HotSpot(TM) Client VM (build 1.5.0-beta-b32c, mixed mode)

A DESCRIPTION OF THE PROBLEM :
When using a Thai collator returned from Collator.getInstance(new Locale("th")) ,  the Collator.compare(string1, string2) method runs forever when string1 and string2 are identical and the string contains only one of the following Thai characters :

  \u0e40
  \u0e41
  \u0e42
  \u0e43
  \u0e44

Note that the above characters are all special Thai "prefix" vowels.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Compile and run the test case.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The compare method runs forever and never return.
ACTUAL -
The compare method runs forever and never return.

ERROR MESSAGES/STACK TRACES THAT OCCUR :
No exception or error occur

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.text.Collator;
import java.util.Locale;

public class WordCount {
    public static void main(String[] args) {
        Collator c = Collator.getInstance(new Locale("th"));
        String s = "\u0e40";
        // any one of \u0e40,  \u0e41, \u0e42, \u0e43, or \u0e44 will do
        System.out.println(c.compare(s, s));  // runs forever
        System.out.println("never reach here");
    }
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Wrap the thai collator with a hard-code check.
(Incident Review ID: 265148) 
======================================================================
###@###.### 11/2/04 18:37 GMT
The OutOfMemoryError is prevalent. We tested on Linux and Windows, on JDK versions 1.5 and 1.6.0_16.

 

Here is a simple repro case:

Collator.getInstance(new Locale("th")).getCollationKey("\u0e44");

 

 

This test written covers the OOM scenarios:

            Locale thaiLoc = new Locale("th");

            Collator thaiColl = Collator.getInstance(thaiLoc);

            String [] oomStrings = { "\u0e44", "\u0e43", "\u0e42", "\u0e41", "\u0e40" };

            for (int i=0; i < oomStrings.length;i++) {

              String oom = oomStrings[i];

              CollationKey key = thaiColl.getCollationKey(oom);

              assertEquals("string #"+i, oom, key.getSourceString());

            }

                                    

Comments
EVALUATION

The implementation/logic in CollationElementIterator.next() method is incorrect
when in scenario 
(1)The current locale does need swap prevowel/consonant pair (like Thai and Lao 
locales) 
(2)the current character is a need-swap prevowel character and 
(3)this prevowel is the last character in the input text stream

The logic of "call next() to see if the next character is a base consonant or
not, if not then call previous() to reset the index point" is broken when the
current character is the last character, because the first "next()" call will
not increase the index pointer (a no-op) but the next "previous()" call will
actually decrease the index by 1, which causes the endless loop in this case.

Need to fix in next release.

###@###.### 2004-05-25
                                     
2004-05-25



Hardware and Software, Engineered to Work Together