FULL PRODUCT VERSION :
1.6.0_07
A DESCRIPTION OF THE PROBLEM :
The following pattern works as expected:
String FIBONACCI =
"(?x) .{0,2} | (?: (?=(\\2|^)) (?=(\\2\\3|^.)) (?=(\\1)) \\2)++ . ";
for (int n = 0; n < 1000; n++) {
String s = new String(new char[n]);
if (s.matches(FIBONACCI)) {
System.out.printf("%s ", n);
}
}
// 0 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
Note that the above uses ++ possessive repetition. Modifying it to +? reluctant backtracking repetition also works. However, using just + greedy backtracking repetition throws StringIndexOutOfBoundsException with index -1.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the provided snippet; it should work as expected.
Then change ++ to +?; it should still work as expected.
Then change to just +; now a StringIndexOutOfBoundsException is thrown for no apparent reason.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
All ++, +?, and + variation should work correctly.
ACTUAL -
++ works, +? works, but + throws a StringIndexOutOfBoundsException.
ERROR MESSAGES/STACK TRACES THAT OCCUR :
Exception in thread "main" java.lang.StringIndexOutOfBoundsException:
String index out of range: -1
at java.lang.String.charAt(Unknown Source)
at java.lang.Character.codePointAt(Unknown Source)
at java.util.regex.Pattern$CharProperty.match(Unknown Source)
at java.util.regex.Pattern$GroupCurly.match0(Unknown Source)
at java.util.regex.Pattern$GroupCurly.match0(Unknown Source)
at java.util.regex.Pattern$GroupCurly.match(Unknown Source)
at java.util.regex.Pattern$Branch.match(Unknown Source)
at java.util.regex.Matcher.match(Unknown Source)
at java.util.regex.Matcher.matches(Unknown Source)
at java.util.regex.Pattern.matches(Unknown Source)
at java.lang.String.matches(Unknown Source)
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
new String(new char[42]).matches("(?:(?=(\\2|^))(?=(\\2\\3|^.))(?=(\\1))\\2)+.");
// throws StringIndexOutOfBoundsException: -8
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
As mentioned, ++ and +? still work correctly in this case, but they have different semantics than + in the general case.