FULL PRODUCT VERSION : java version "1.7.0_01" Java(TM) SE Runtime Environment (build 1.7.0_01-b08) Java HotSpot(TM) 64-Bit Server VM (build 21.1-b02, mixed mode) (but this problem also exists in 1.5, 1.6, etc) ADDITIONAL OS VERSION INFORMATION : Linux beast 3.0.0-12-generic #20-Ubuntu SMP Fri Oct 7 14:56:25 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux A DESCRIPTION OF THE PROBLEM : BreakIterator has problems with some supplementary character sequences. When iterating text that contains these characters, it throws an internal ArrayIndexOutOfBoundsException in RuleBasedBreakIterator.lookupState STEPS TO FOLLOW TO REPRODUCE THE PROBLEM : Run the included test program EXPECTED VERSUS ACTUAL BEHAVIOR : EXPECTED - it should not throw an exception, instead next() should return next text boundary or BreakIterator.DONE ACTUAL - Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 268 at java.text.RuleBasedBreakIterator.lookupState(RuleBasedBreakIterator.java:1036) at java.text.RuleBasedBreakIterator.handleNext(RuleBasedBreakIterator.java:931) at java.text.RuleBasedBreakIterator.next(RuleBasedBreakIterator.java:621) at test.main(test.java:8) REPRODUCIBILITY : This bug can be reproduced always. ---------- BEGIN SOURCE ---------- import java.text.BreakIterator; import java.util.Locale; public class test { public static void main(String args[]) { BreakIterator bi = BreakIterator.getSentenceInstance(Locale.US); bi.setText("\udb40\udc53"); // U+E0053, TAG LATIN CAPITAL LETTER S bi.next(); } } ---------- END SOURCE ----------
|