JDK-8203458 : java.util.regex.Matcher#find hung for ever while matching a regex in big string
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Affected Version: 8,9,10,11
  • Priority: P4
  • Status: In Progress
  • Resolution: Unresolved
  • OS: generic
  • CPU: generic
  • Submitted: 2018-05-20
  • Updated: 2019-12-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Description
A DESCRIPTION OF THE PROBLEM :
java.util.regex.Matcher hung for ever while matching a regex in huge string, it remains in following stacktrace forever.
"main" #1 prio=5 os_prio=31 tid=0x00007f8e12002000 nid=0x2603 runnable [0x0000700006a49000]
   java.lang.Thread.State: RUNNABLE
	at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
	at java.util.regex.Pattern$Slice.match(Pattern.java:3972)
	at java.util.regex.Pattern$Branch.match(Pattern.java:4604)
	at java.util.regex.Pattern$Start.match(Pattern.java:3461)
	at java.util.regex.Matcher.search(Matcher.java:1248)
	at java.util.regex.Matcher.find(Matcher.java:664)


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
steps to reproduce:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexTest {
    private static final String ALPHA_NUMERIC_STRING = "ABC DEFGH IJKLM NOPQR STUVWXY Z 01 234 567 89";
    public static void main(String[] args) {

        String dummyVar = "";

        Pattern p = Pattern.compile("HUNG.* .* PHONE|PRICE.*");
        Matcher m = p.matcher(randomAlphaNumeric(403403));
        System.out.println("looking for pattern : "+p);
        boolean match = m.find(0);
        System.out.println("done : match found  : "+match);

    }
    public static String randomAlphaNumeric(int count) {
        StringBuilder builder = new StringBuilder();
        while (count-- != 0) {
            int character = (int)(Math.random()*ALPHA_NUMERIC_STRING.length());
            builder.append(ALPHA_NUMERIC_STRING.charAt(character));
            if(count % 1000 == 0){
                builder.append("HUNG");
            }
        }
        return builder.toString();
    }    
}


EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
it should return the boolean result whether string has a match for given pattern 
ACTUAL -
it never comes back and hung forever.

FREQUENCY : always



Comments
Possibly a case of catastrophic backtracking.
12-12-2019

it's not "hanging", it's backtracking, with a input that never matched. as a workaround to use possessive quantifier ".*+ .*+ ", instead of the greedy one, if possible. That said, we had done optimization in this area in previous releases to eliminate most of this kinda of backtracking use cases. Something can/should be done for this particular use scenario.
21-05-2018

To reproduce the issue, run the attached test case. JDK 8u171 - Fail JDK 10.0.1 - Fail JDK 11-ea+10 - Fail There is no output, the program just hangs.
21-05-2018