SYNOPSIS -------- StringIndexOutofBoundsException in Match.find() when input String contains surrogate UTF-16 characters OPERATING SYSTEMS ----------------- All FULL JDK VERSIONS ----------------- All (Since JDK 1.5.0) PROBLEM DESCRIPTION ------------------- When the Match.find() is called for an input String with surrogate characters in the string, it throws a StringIndexOutofBoundsException under the following circumstances: 1. When a regex pattern results in a call to the GroupCurly.match0() method 2. When the surrogate pair in the String is after an index > 4+ minimum expected length of the input string for the pattern 3. When the pattern does not match the input string REPRODUCTION INSTRUCTIONS ------------------------- Simply compile and run the attached test case. Observed behaviour (this specific trace is from 7u9): java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.charAt(String.java:658) at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715) at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556) at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4360) at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4354) at java.util.regex.Pattern$GroupCurly.match(Pattern.java:4304) at java.util.regex.Pattern$SliceI.match(Pattern.java:3895) at java.util.regex.Pattern$Start.match(Pattern.java:3408) at java.util.regex.Matcher.search(Matcher.java:1199) at java.util.regex.Matcher.find(Matcher.java:592) at RegexTestCase.main(RegexTestCase.java:11) Expected Behavior: No Exceptions should be thrown. The pattern does not match, so Matcher.find() should return false. TEST CASE --------- import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegexTestCase { public static void main(String[] args) { String ptrnStr = "test(.)+(@[a-zA-Z.]+)"; Pattern ptrn = Pattern.compile (ptrnStr, Pattern.CASE_INSENSITIVE); String inputStr = "test this as \ud83d\ude0d"; Matcher matcher = ptrn.matcher(inputStr); try { if (matcher.find()) { System.out.println("Found String"); } else { System.out.println("Not found"); } } catch (StringIndexOutOfBoundsException siob) { System.out.println("Testcase Failed"); siob.printStackTrace(); } } } WORK AROUND ---------- Catch the exception and treat is as a "false" return value. SUGGESTED FIX ------------- See attachment.
|