United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-7104012 : AIOOBE from RuleBasedBreakIterator.lookupState for some suppl. chars

Details
Type:
Bug
Submit Date:
2011-10-23
Status:
Resolved
Updated Date:
2014-02-05
Project Name:
JDK
Resolved Date:
2012-10-03
Component:
core-libs
OS:
linux
Sub-Component:
java.text
CPU:
x86
Priority:
P4
Resolution:
Fixed
Affected Versions:
7
Fixed Versions:

Related Reports

Sub Tasks

Description
FULL PRODUCT VERSION :
java version "1.7.0_01"
Java(TM) SE Runtime Environment (build 1.7.0_01-b08)
Java HotSpot(TM) 64-Bit Server VM (build 21.1-b02, mixed mode)

(but this problem also exists in 1.5, 1.6, etc)

ADDITIONAL OS VERSION INFORMATION :
Linux beast 3.0.0-12-generic #20-Ubuntu SMP Fri Oct 7 14:56:25 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux


A DESCRIPTION OF THE PROBLEM :
BreakIterator has problems with some supplementary character sequences. When iterating text that contains these characters, it throws an internal ArrayIndexOutOfBoundsException in RuleBasedBreakIterator.lookupState

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the included test program

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
it should not throw an exception, instead next() should return next text boundary or BreakIterator.DONE
ACTUAL -
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 268
	at java.text.RuleBasedBreakIterator.lookupState(RuleBasedBreakIterator.java:1036)
	at java.text.RuleBasedBreakIterator.handleNext(RuleBasedBreakIterator.java:931)
	at java.text.RuleBasedBreakIterator.next(RuleBasedBreakIterator.java:621)
	at test.main(test.java:8)


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.text.BreakIterator;
import java.util.Locale;

public class test {
  public static void main(String args[]) {
    BreakIterator bi = BreakIterator.getSentenceInstance(Locale.US);
    bi.setText("\udb40\udc53"); // U+E0053, TAG LATIN CAPITAL LETTER S
    bi.next();
  }
}

---------- END SOURCE ----------

                                    

Comments
URL:   http://hg.openjdk.java.net/jdk8/jdk8/jdk/rev/4744dc70e5d1
User:  lana
Date:  2012-10-12 18:08:25 +0000

                                     
2012-10-12
SupplementaryCharacterData.getValue(int codepoint) returns (int)0xFF for codepoints w/ a specific category. It is not same as (byte)0xFF(=-1) which is defined as RuleBasedBreakIterator.IGNORE.

Changed the return value of getValue() to return IGNORE if the gotten value is 0xFF.


                                     
2012-10-03
URL:   http://hg.openjdk.java.net/jdk8/tl/jdk/rev/4744dc70e5d1
User:  peytoia
Date:  2012-10-03 06:13:55 +0000

                                     
2012-10-03
EVALUATION

AIOOBE should not be thrown.
This bug needs to be fixed in JDK 8.
                                     
2011-12-08



Hardware and Software, Engineered to Work Together