JDK-5088563 : Matcher.find throws StringIndexOutOfBoundsException if pattern is missing ']'
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Affected Version: 5.0,5.0u12
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2004-08-18
  • Updated: 2007-08-17
  • Resolved: 2007-06-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
5.0u14 b01Fixed
Description
Tried on Solaris-9 JDK 1.5.0-beta3-b58

Pattern : "\p{javaMirrored}\P{javaMirrored}+\p{javaMirrored}" 
Input : sdjhjshdka{dhhd}sjdhjs 
Works fine for the above input
  
But for the Input "sdjhjshdka{dhhd}sjdhjssdkjd[sdsd"
Throws the following Exception :
java.lang.StringIndexOutOfBoundsException: String index out of range: 32
	at java.lang.String.charAt(String.java:558)
	at java.util.regex.Pattern.countChars(Pattern.java:2791)
	at java.util.regex.Pattern.access$000(Pattern.java:595)
	at java.util.regex.Pattern$Not.match(Pattern.java:3764)
	at java.util.regex.Pattern$Curly.match0(Pattern.java:4222)
	at java.util.regex.Pattern$Curly.match(Pattern.java:4196)
	at java.util.regex.Pattern$JavaTypeClass.match(Pattern.java:3595)
	at java.util.regex.Pattern$Start.match(Pattern.java:3019)
	at java.util.regex.Matcher.search(Matcher.java:1092)
	at java.util.regex.Matcher.find(Matcher.java:528)
	at Test1.check(Test1.java:9)
	at Test1.main(Test1.java:20)

Test Case :
execute java Test1 "\p{javaMirrored}\P{javaMirrored}+\p{javaMirrored}" "sdjhjshdka{dhhd}sjdhjssdkjd[sdsd"

import java.util.regex.*;

public class Test1 {
   
   public void check(String str1, String str2) {
      try {
            Pattern p = Pattern.compile(str1);
            Matcher m = p.matcher(str2);
            while(m.find()) {
                System.out.println(m.group());
            }
      }catch(Exception e) {
         e.printStackTrace();
      }
   }
   
   public static void main(String args[]) {
             
       Test1 ref = new Test1();
       ref.check(args[0], args[1]); 
   }

}

Comments
SUGGESTED FIX Basically the change is *** src/share/classes/java/util/regex/Pattern.java- Wed Feb 14 13:55:32 2007 --- src/share/classes/java/util/regex/Pattern.java Mon Apr 23 12:16:36 2007 *** 3764,3775 **** --- 3764,3778 ---- Node atom; Not(Node atom) { this.atom = atom; } boolean match(Matcher matcher, int i, CharSequence seq) { + if (i < matcher.to) return !atom.match(matcher, i, seq) && next.match(matcher, i+countChars(seq, i, 1), seq); + matcher.hitEnd = true; + return false; } boolean study(TreeInfo info) { info.minLength++; info.maxLength++; return next.study(info);
23-04-2007

EVALUATION This error arises, although other patterns are used. Test case: public final class Bug5088563{ private static String REGEX; private static String INPUT; private static Pattern pattern; private static Matcher matcher; private static boolean found; public static void main(String[] argv) { initResources(); processTest(); } private static void initResources() { try { REGEX = "\\p{javaWhitespace}\\P{javaWhitespace}+\\p{javaWhitespace}"; INPUT = "ASDASSAs dsdssad fssdsASAd sdsd sdsd ddd"; } catch (Exception ioe) { ioe.printStackTrace(); } pattern = Pattern.compile(REGEX); matcher = pattern.matcher(INPUT); System.out.println("Current REGEX is: "+REGEX); System.out.println("Current INPUT is: "+INPUT); } private static void processTest() { try{ while(matcher.find()) { System.out.println("I found the text \"" + matcher.group() + "\" starting at index " + matcher.start() + " and ending at index " + matcher.end() + "."); found = true; } if(!found) System.out.println("No match found."); System.out.println("Test case passed"); System.exit(0); }catch(Exception exe){ exe.printStackTrace(); System.out.println("Test case failed"); System.exit(1); } } } Output is: Current REGEX is: \p{javaWhitespace}\P{javaWhitespace}+\p{javaWhitespace} Current INPUT is: ASDASSAs dsdssad fssdsASAd sdsd sdsd ddd I found the text " dsdssad " starting at index 11 and ending at index 20. I found the text " sdsd " starting at index 31 and ending at index 37. java.lang.StringIndexOutOfBoundsException: String index out of range: 46 at java.lang.String.charAt(String.java:558) at java.util.regex.Pattern.countChars(Pattern.java:2791) at java.util.regex.Pattern.access$000(Pattern.java:595) at java.util.regex.Pattern$Not.match(Pattern.java:3769) at java.util.regex.Pattern$Curly.match0(Pattern.java:4228) at java.util.regex.Pattern$Curly.match(Pattern.java:4202) at java.util.regex.Pattern$JavaTypeClass.match(Pattern.java:3600) at java.util.regex.Pattern$Start.match(Pattern.java:3019) at java.util.regex.Matcher.search(Matcher.java:1092) at java.util.regex.Matcher.find(Matcher.java:528) at Bug5088563.processTest(Bug5088563.java:59) at Bug5088563.main(Bug5088563.java:38) Test case failed Since initial bug was filed for tiger and got closed as not reproducible but was not fixed there, it needs to be reopen.
16-02-2007

EVALUATION This problem has been "accidently" fixed in Mustang by other regex rewrite work. Root cause is the incorrect Not class implementation in 5.0 codebase, see the suggested fix for a possible solution, if 5.0u fix is desired.
20-05-2006

SUGGESTED FIX boolean match(Matcher matcher, int i, CharSequence seq) { if (i < matcher.to) return !atom.match(matcher, i, seq) && next.match(matcher, i+countChars(seq, i, 1), seq); matcher.hitEnd = true; return false; }
20-05-2006

EVALUATION Likely a documentation issue. -- iag@sfbay 2004-09-18
18-09-2004