FULL PRODUCT VERSION :
java version "1.5.0_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_01-b08)
Java HotSpot(TM) Client VM (build 1.5.0_01-b08, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Linux brave 18.104.22.168 #1 SMP Fri Aug 12 12:58:09 WST 2005 i686 i686 i386 GNU/Linux
EXTRA RELEVANT SYSTEM CONFIGURATION :
A DESCRIPTION OF THE PROBLEM :
I am working with Regular Expression(RE) using the latest java.util.regex.Pattern. Due to the many alternation group (e.g. (a|b|c|d) ), the regular expression I am constructing is usually very large with multiple alternation groups.
The problem is that when I am compiling the large RE using Pattern.compile(patStr, Pattern.CASE_INSENSITIVE), the compiling process took hours to complete.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Using a FOR loop of counter X, I construct X RE alternation group with random characters.
Each alternative group consist of 10 items. For example,
String myGroup1 = "(aaaa|bbbb|cccc|dddd|eeee|ffff|gggg|hhhh|iiii|jjjj)";
String myGroup 2 = ...
For the first test,
myGroup1 is compiled with the starttime and end time registered.
For the second test, myGroup2 is appended to myGroup1
myGroup1 is recompiled with the starttime and end time registered.
EXPECTED VERSUS ACTUAL BEHAVIOR :
I would expected a linear trend in the compilation time when a new alternation group is added.
The following is compilation time that was recorded.
Time to Compile 1 group 3ms.
Time to Compile 2 group 1ms.
Time to Compile 3 group 3ms.
Time to Compile 4 group 7ms.
Time to Compile 5 group 10ms.
Time to Compile 6 group 65ms.
Time to Compile 7 group 721ms.
Time to Compile 8 group 7090ms.
Time to Compile 9 group 68536ms.
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
int cnt = 1000;
int i = 1;
String patStr = new String("");
for(i = 1; i < cnt; i++)
//long initial = System.currentTimeMillis();
String words = generateWords(10);
patStr += buildRePortion(words);
long startCompile = System.currentTimeMillis();
Pattern pattern = Pattern.compile(patStr, Pattern.CASE_INSENSITIVE);
long finishCompile = System.currentTimeMillis();
System.out.println("Time to Compile " + i + " group " + (finishCompile - startCompile) + "ms.\n");
patStr += " ";
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
No method in the java.util.regex.Pattern library allows me to reduce the compliation time.
I had tried to use another package from org.apache.oro.text.
The compilation is almost instant as compared to the hours it took for java.util.regex.Pattern.