JDK-6337993 : java.util.regex.Pattern StackOverflowError's on (x|y)*
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Affected Version: 1.4.2,5.0,6u10
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • OS: windows_2000,windows_7
  • CPU: x86
  • Submitted: 2005-10-18
  • Updated: 2017-09-15
Related Reports
Duplicate :  
Relates :  
Relates :  
Description
FULL PRODUCT VERSION :
java version "1.5.0_05"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_05-b05)
Java HotSpot(TM) Client VM (build 1.5.0_05-b05, mixed mode, sharing)


ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows 2000 [Version 5.00.2195]

A DESCRIPTION OF THE PROBLEM :
any pattern like (x|y)* (an alternative wrapped in a star) is implemented recursively by Pattern. Each matched alternative uses about 5 method calls on the stack.

This makes that kind of pattern useless on any reasonably-sized input. Is there any chance you can make the Pattern implementation non-recursive in this case?

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
see example code

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
matched? true
ACTUAL -
java.lang.StackOverflowError

ERROR MESSAGES/STACK TRACES THAT OCCUR :
java.lang.StackOverflowError
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
	at java.util.regex.Pattern$Single.match(Pattern.java:3004)
	at java.util.regex.Pattern$Branch.match(Pattern.java:3933)
	at java.util.regex.Pattern$GroupHead.match(Pattern.java:3973)
	at java.util.regex.Pattern$Loop.match(Pattern.java:4100)
	at java.util.regex.Pattern$GroupTail.match(Pattern.java:4032)
Exception in thread "main"


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class RegexpFun {
    public static void main(String[] args) {

        CharSequence largeString = makeLargeString();
        Pattern p = Pattern.compile("(x|y)*");
        Matcher m = p.matcher(largeString);
        System.out.println("matches? " + m.matches()); // throws StackOverflowError
    }

    private static CharSequence makeLargeString() {
        final int size = 1000;
        StringBuffer largeString = new StringBuffer(size);
        for (int i = 0; i < size/2; i++) {
            largeString.append("xy");
        }
        return largeString;
    }
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
no workaround

Comments
SUGGESTED FIX With the help of members of the regular expression team (and of course the UWC development team), we were able to optimize the regular expression used in parsing filters with \ and ". The results are quite encouraging: we are able to get to create reasonably complex filters with \ and " (please see below). The patch with this fix is available at the following location: /net/brandy.red.iplanet.com/wspace/sv102973/JAN-31-2006/patches-uwc-6.2p3-export-full-SPARC-20060131-118540-28.tar Please feel free to use the patch at the above location in order to test in your environment and let me know how is goes... Regards, Srinivas. ===================================================================================== Following are some of the filters that were successfully created, read and edited: $ ldapsearch -b "ou=people,o=india.sun.com,dc=india,dc=sun,dc=com" -h varali.india.sun.com -D "cn=directory manager" -w password -T -x -S mailSieveRuleSource -B "uid=admin" . . . mailSieveRuleSource=#RULE: $Name="quotedFilter" $Order=11 $Type="DEFAULT_TYPE" require "fileinto"; #BEGINFILTER if allof(header :contains ["Subject"," Comments"," Keywords"] "output from \"cron\" or \"mon\" or \"bon\" or \"john\" or \"this\" or \"that\""){ fileinto "Inbox"; stop; } #ENDFILTER mailSieveRuleSource=#RULE: $Name="slashedFilter" $Order=12 $Type="DEFAULT_TYPE" require "notify"; require "variables"; #BEGINFILTER if allof(allof(header :contains ["Subject"," Comments"," Keywords"] "this is a \\slashed\\ string containing \\abc\\\, \\def\\\, \\ghi\\\, \\jkl\\\, \\mno\\\, \\pqr\\\, \\stu\\\, \\vwx\\\, and \\yz\\"),anyof(allof(not header :matches ["Subject"," Comments"," Keywords"] ["POSTMASTER-AUTO-FW:*","postmaster-auto-fw:*"],header :matches ["Subject"," Comments"," Keywords"] "*"),true)){ notify :method "email" :echo :options "###@###.###" "POSTMASTER-AUTO-FW:${1}" "This is an auto-forwarded message"; stop; } #ENDFILTER mailSieveRuleSource=#RULE: $Name="quotedAndSlashed" $Order=13 $Type="DEFAULT_TYPE" require "fileinto"; require "notify"; require "variables"; #BEGINFILTER if allof(allof(header :contains ["Subject"," Comments"," Keywords"] "output from \"cron\" or \"mon\" or \"bon\" or \"john\" or \"this\" or \"that\"",header :contains ["Subject"," Comments"," Keywords"] "this is a \\slashed\\ string containing \\abc\\\, \\def\\\, \\ghi\\\, \\jkl\\\, \\mno\\\, \\pqr\\\, \\stu\\\, \\vwx\\\, and \\yz\\"),anyof(allof(not header :matches ["Subject"," Comments"," Keywords"] ["POSTMASTER-AUTO-FW:*","postmaster-auto-fw:*"],header :matches ["Subject"," Comments"," Keywords"] "*"),true)){ fileinto "Inbox"; notify :method "email" :echo :options "###@###.###" "POSTMASTER-AUTO-FW:${1}" "This is an auto-forwarded message"; stop; } #ENDFILTER
22-02-2006

EVALUATION Yep, the limit of current implementation, need find a solution.
22-10-2005