JDK-5047314 : [Col] Collator.compare() runs indefinitely for a certain set of Thai strings
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.text
  • Affected Version: 1.4.2,6u16,6-pool
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: solaris_10,windows,windows_xp
  • CPU: x86,sparc
  • Submitted: 2004-05-14
  • Updated: 2010-07-09
  • Resolved: 2010-02-16
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6 JDK 7
6-poolResolved 7 b84Fixed
Description
Name: rmT116609			Date: 05/13/2004


FULL PRODUCT VERSION :
java version "1.4.2_04"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_04-b05)
Java HotSpot(TM) Client VM (build 1.4.2_04-b05, mixed mode)

java version "1.5.0-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-beta-b32c)
Java HotSpot(TM) Client VM (build 1.5.0-beta-b32c, mixed mode)

A DESCRIPTION OF THE PROBLEM :
When using a Thai collator returned from Collator.getInstance(new Locale("th")) ,  the Collator.compare(string1, string2) method runs forever when string1 and string2 are identical and the string contains only one of the following Thai characters :

  \u0e40
  \u0e41
  \u0e42
  \u0e43
  \u0e44

Note that the above characters are all special Thai "prefix" vowels.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Compile and run the test case.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The compare method runs forever and never return.
ACTUAL -
The compare method runs forever and never return.

ERROR MESSAGES/STACK TRACES THAT OCCUR :
No exception or error occur

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.text.Collator;
import java.util.Locale;

public class WordCount {
    public static void main(String[] args) {
        Collator c = Collator.getInstance(new Locale("th"));
        String s = "\u0e40";
        // any one of \u0e40,  \u0e41, \u0e42, \u0e43, or \u0e44 will do
        System.out.println(c.compare(s, s));  // runs forever
        System.out.println("never reach here");
    }
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Wrap the thai collator with a hard-code check.
(Incident Review ID: 265148) 
======================================================================
###@###.### 11/2/04 18:37 GMT
The OutOfMemoryError is prevalent. We tested on Linux and Windows, on JDK versions 1.5 and 1.6.0_16.

 

Here is a simple repro case:

Collator.getInstance(new Locale("th")).getCollationKey("\u0e44");

 

 

This test written covers the OOM scenarios:

            Locale thaiLoc = new Locale("th");

            Collator thaiColl = Collator.getInstance(thaiLoc);

            String [] oomStrings = { "\u0e44", "\u0e43", "\u0e42", "\u0e41", "\u0e40" };

            for (int i=0; i < oomStrings.length;i++) {

              String oom = oomStrings[i];

              CollationKey key = thaiColl.getCollationKey(oom);

              assertEquals("string #"+i, oom, key.getSourceString());

            }

Comments
EVALUATION The implementation/logic in CollationElementIterator.next() method is incorrect when in scenario (1)The current locale does need swap prevowel/consonant pair (like Thai and Lao locales) (2)the current character is a need-swap prevowel character and (3)this prevowel is the last character in the input text stream The logic of "call next() to see if the next character is a base consonant or not, if not then call previous() to reset the index point" is broken when the current character is the last character, because the first "next()" call will not increase the index pointer (a no-op) but the next "previous()" call will actually decrease the index by 1, which causes the endless loop in this case. Need to fix in next release. ###@###.### 2004-05-25
25-05-2004