JDK-6342544 : Compilation Time of java.util.regex.Pattern takes too long
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Affected Version: 5.0
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux
  • CPU: x86
  • Submitted: 2005-10-27
  • Updated: 2014-09-03
  • Resolved: 2005-12-17
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other JDK 6
5.0u81Fixed 6 b65Fixed
Related Reports
Relates :  
Description
FULL PRODUCT VERSION :
java version "1.5.0_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_01-b08)
Java HotSpot(TM) Client VM (build 1.5.0_01-b08, mixed mode)

ADDITIONAL OS VERSION INFORMATION :
Linux brave 2.6.12.4 #1 SMP Fri Aug 12 12:58:09 WST 2005 i686 i686 i386 GNU/Linux

EXTRA RELEVANT SYSTEM CONFIGURATION :
University Network

A DESCRIPTION OF THE PROBLEM :
I am working with Regular Expression(RE) using the latest java.util.regex.Pattern. Due to the many alternation group (e.g. (a|b|c|d) ), the regular expression I am constructing is usually very large with multiple alternation groups.

The problem is that when I am compiling the large RE using Pattern.compile(patStr, Pattern.CASE_INSENSITIVE), the compiling process took hours to complete.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Using a FOR loop of counter X, I construct X RE alternation group with random characters.
Each alternative group consist of 10 items. For example,
String myGroup1 = "(aaaa|bbbb|cccc|dddd|eeee|ffff|gggg|hhhh|iiii|jjjj)";
String myGroup 2 = ...
For the first test,
myGroup1 is compiled with the starttime and end time registered.
For the second test, myGroup2 is appended to myGroup1
myGroup1 is recompiled with the starttime and end time registered.


EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
I would expected a linear trend in the compilation time when a new alternation group is added.
ACTUAL -
The following is compilation time that was recorded.

Time to Compile 1 group 3ms.
Time to Compile 2 group 1ms.
Time to Compile 3 group 3ms.
Time to Compile 4 group 7ms.
Time to Compile 5 group 10ms.
Time to Compile 6 group 65ms.
Time to Compile 7 group 721ms.
Time to Compile 8 group 7090ms.
Time to Compile 9 group 68536ms.


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
int cnt = 1000;
        int i = 1;
        String patStr = new String("");
        for(i = 1; i < cnt; i++)
        {
         //long initial = System.currentTimeMillis();
         String[] words = generateWords(10);
         patStr += buildRePortion(words);
     
         long startCompile = System.currentTimeMillis();
         Pattern pattern = Pattern.compile(patStr, Pattern.CASE_INSENSITIVE);
         long finishCompile = System.currentTimeMillis();
     
         System.out.println(patStr);
         System.out.println("Time to Compile " + i + " group " + (finishCompile - startCompile) + "ms.\n");
         patStr += " ";
       }
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
No method in the java.util.regex.Pattern library allows me to reduce the compliation time.

I had tried to use another package from org.apache.oro.text.

The compilation is almost instant as compared to the hours it took for  java.util.regex.Pattern.

Comments
EVALUATION see 5013651
29-10-2005