JDK-6192895 : java.util.regex.Matcher: Performance issue
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Affected Version: 1.4.2
  • Priority: P4
  • Status: Closed
  • Resolution: Not an Issue
  • OS: windows_2000
  • CPU: x86
  • Submitted: 2004-11-09
  • Updated: 2019-06-17
  • Resolved: 2005-11-11
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9 b119Resolved
Related Reports
Duplicate :  
Description
FULL PRODUCT VERSION :
java version "1.5.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-b64)
Java HotSpot(TM) Client VM (build 1.5.0-b64, mixed mode)

java version "1.4.2_06-ea"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_06-ea-b01)
Java HotSpot(TM) Client VM (build 1.4.2_06-ea-b01, mixed mode)


ADDITIONAL OS VERSION INFORMATION :
4DOS for Windows NT 2,52A   Windows NT 5,0

A DESCRIPTION OF THE PROBLEM :
Application hangs when using Unicode characters that are not non-US-ASCII (like ��). But only when the special characters occur at the end of the input string.
"Hello World this ç is a test this is a test" works fine (62 ms)
"Hello World this is a test this is a test ç " hangs  (22063 ms)



REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.util.regex.*;

class Test159 {
    public static void main(String[] args) {
        String pat = " *([a-zA-Z0-9/\\-\\?:\\(\\)\\.,'\\+\\{\\}]+ *)+";
        String input = "ü Hello World this is a test this is a test";  //  0 ms
        String input1 = "Hello World this ü is a test this is a test";  //   62 ms
        String inputLags = "Hello World this is a test this is a test ç ";  //  21300 ms
        Pattern pattern = Pattern.compile(pat);
        Matcher matcher = pattern.matcher(inputLags);
        long t = System.currentTimeMillis();
        boolean isMatched = matcher.matches();
        System.out.println ("took " + (System.currentTimeMillis() - t) + " ms");
    }
}

---------- END SOURCE ----------
###@###.### 2004-11-09 19:00:42 GMT

Comments
EVALUATION This has nothing to do with "non-US-ASCII" or whatever special characters in a pattern, the "performance issue" we are expenriencing here is the inevitable "exponential" backtracking expense when a pattern with */+ quantifers exhausts all possibility to try to find a match and finally ends of a "no match". Chapter6 of Jeffrey's Regular Expressions has detailed description of the nature of this "inefficiency". Below is a much easier/Simpler but also shows the issue String pat = " *([a-z0-9]+ *)+"; String inputLags = "hello world this is a test this is a test A"; As a possible optimization for the re specified in the test case, using "+" instead of "*" after the space character should speed up the match (to fail) String pat = " *([a-zA-Z0-9/\\-\\?:\\(\\)\\.,'\\+\\{\\}]+ +)+"; Close as "not a defect"/
11-11-2005