JDK-6271399 : Pattern matching using regex takes a long time
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Affected Version: 1.4.2
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2005-05-17
  • Updated: 2014-09-03
  • Resolved: 2005-12-17
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other JDK 6
5.0u81Fixed 6 b65Fixed
Related Reports
Relates :  
Description
FULL PRODUCT VERSION :
java version "1.4.2_08"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_08-b03)
Java HotSpot(TM) Client VM (build 1.4.2_08-b03, mixed mode)

ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows XP [Version 5.1.2600]

A DESCRIPTION OF THE PROBLEM :
In trying to match a regular expression such as, "(a|e|i|o|u)?(f|v|ff|vv)(a|e|i|o|u)?(r|rr)(a|e|i|o|u)?(s|ss)(a|e|i|o|u)?" to any given string such as "mare" produces instant results.  However, trying to match a regular expression that is lengthier than the above example takes three minutes or longer.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Compile and execute the standalone code included.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The output should be:
false
RE match completed in 0.016 seconds.
true
RE match completed in 0.016 seconds.
ACTUAL -
false
RE match completed in 0.016 seconds.
true
RE match completed in 160.922 seconds.

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.io.*;
import java.util.*;
import java.util.regex.*;


public class Test
{
   public static void main( String argv[] )
   {
      long startTime = System.currentTimeMillis();

      String s1 = "(a|e|i|o|u)?(f|v|ff|vv)(a|e|i|o|u)?(r|rr)(a|e|i|o|u)?(s|ss)(a|e|i|o|u)?";
      Pattern p = Pattern.compile( s1 );
      
      String mare = new String( "mare" );
      Matcher m = p.matcher( mare );
      
      boolean b = m.matches();
      System.out.println( b );

      long finishTime = System.currentTimeMillis();
      System.out.println( "RE match completed in " + (double)(finishTime - startTime)/1000 + " seconds.");
      
      
      startTime = System.currentTimeMillis();
      String s2 = "(a|e|i|o|u)?(f|v|ff|vv)(a|e|i|o|u)?(r|rr)(a|aa|a|null)?(n|nn)(a|e|i|o|u)?(k|kk)(a|e|i|o|u)?(f|v|ff|vv)(a|e|i|o|u)?(r|rr)(a|e|i|o|u)?(t|tt)(a|e|i|o|u)?";
      p = Pattern.compile( s2 );
      String frankfurt = new String( "frankfurt" );
      m = p.matcher( str );
      b = m.matches();
      System.out.println( b );

      finishTime = System.currentTimeMillis();
      System.out.println( "RE match completed in " + (double)(finishTime - startTime)/1000 + " seconds.");
   }
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
I am in the process of writing my own regular expression pattern parser and matcher.  If I didn't have to use Java, Perl and Python are able to accomplish this same task in seconds.
###@###.### 2005-05-17 04:52:07 GMT

Comments
EVALUATION might run into the same performance problem as described in 5013651... ###@###.### 2005-05-17 06:45:21 GMT
17-05-2005