JDK-5050507 : Pattern.matches throws StackOverFlow Error
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Affected Version: 1.4.2
  • Priority: P4
  • Status: Closed
  • Resolution: Won't Fix
  • OS: windows_2000
  • CPU: x86
  • Submitted: 2004-05-20
  • Updated: 2004-05-20
  • Resolved: 2004-05-20
Description

Name: rmT116609			Date: 05/20/2004


FULL PRODUCT VERSION :
java version "1.4.2_03"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)

java version "1.5.0-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-beta-b32c)
Java HotSpot(TM) Client VM (build 1.5.0-beta-b32c, mixed mode)

ADDITIONAL OS VERSION INFORMATION :
ver 5.0, sp-3

A DESCRIPTION OF THE PROBLEM :
While running the following java code, i get a StackOverFlow error:

The problem is observed for a string of length greater than 818 for the pattern - ^([a-fA-F]|\d)+$

If i use a pattern "1*", then i do not see this progarm. I need the Pattern.matches to validate a data of length 1400 bytes against the pattern - ^([a-fA-F]|\d)+$.

I have attached the code, that would reproduce the problem.




STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the following program:

1. Th test case conatins data of size 1400 bytes and the pattern - ^([a-fA-F]|\d)+$

2. In case u use some other pattern it works fine.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Need Pattern.matches(pattern, string) to work for pattern- ^([a-fA-F]|\d)+$ against data of 1400 bytes.
ACTUAL -
i am pasting a part of the trace:

StackOverFlow..
.....
.....
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Ctype.match(Unknown Source)
at java.util.regex.Pattern$Branch.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Ctype.match(Unknown Source)
at java.util.regex.Pattern$Branch.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Ctype.match(Unknown Source)
at java.util.regex.Pattern$Branch.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Ctype.match(Unknown Source)
at java.util.regex.Pattern$Branch.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Ctype.match(Unknown Source)
at java.util.regex.Pattern$Branch.match(Unknown Source)
...
...


ERROR MESSAGES/STACK TRACES THAT OCCUR :
StackOverFlow..
.....
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Ctype.match(Unknown Source)
at java.util.regex.Pattern$Branch.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Ctype.match(Unknown Source)
at java.util.regex.Pattern$Branch.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Ctype.match(Unknown Source)
at java.util.regex.Pattern$Branch.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Ctype.match(Unknown Source)
at java.util.regex.Pattern$Branch.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Ctype.match(Unknown Source)
at java.util.regex.Pattern$Branch.match(Unknown Source)
...
...

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------

import java.util.regex.Pattern;

class TestPattern
{
	public static void main(String args[])
	{
		try
		{
			String data = "";
			for (int i = 0; i < 1400; i++)
			{
				data = data.concat(Integer.toHexString(1));
			}
			System.out.println("Length of data : " + data.length());
			if (!Pattern.matches("^([a-fA-F]|\\d)+$", data.trim()))
			{
				System.out.println("data does not match pattern");
			}
			else
				System.out.println("data matches pattern");
		}
		catch(Throwable t )
		{
			t.printStackTrace();
		}
	}
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Code works for pattern "1*"...but that does not solve my problem, since the data that i need to validate could be anything...and hence the pattern
^([a-fA-F]|\d)+$ should work.
(Incident Review ID: 270349) 
======================================================================

Comments
WORK AROUND Use "^([0-9a-fA-F])+$" or "^((?i)[0-9a-f])+$" or even "^([a-fA-F]|\\d)++$" Avoid alternation whenever possible, alternations are inefficient, causing slow match times as well. So the source referenced in the Pattern doc for more information about writing efficient patterns. ###@###.### 2004-05-20
20-05-2004

EVALUATION It will always be possible to create patterns that use excessive memory or take 10,000 years to complete. We cannot prevent all these problems without destroying the power of regular expressions. See workaround. ###@###.### 2004-05-20
20-05-2004