JDK-4771934 : Matcher.find() hangs for no apparent reason
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Affected Version: 1.4.1
  • Priority: P3
  • Status: Closed
  • Resolution: Not an Issue
  • OS: solaris_9
  • CPU: generic
  • Submitted: 2002-10-31
  • Updated: 2002-10-31
  • Resolved: 2002-10-31
Description
###@###.### 2002-10-31
I wrote a little program to replace the "<meta ... charset=" html tag with a tag containing a defined codeset for any input files, and found that it was hanging on two files (both samples attached)

The pattern that it's hanging on looks like :

// looking for a html meta tag like :
// <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-5"> 
        
Pattern mypattern = Pattern.compile("<\\s*"+      
                                "(meta|META)"+
                                "(\\s|[^>])+"+
                                "(CHARSET|charset)="+
                                "(\\s|[^>])+>");

my test program (attached) can be run on any html input file, and it should print out what (if any) text it replaced. I can reproduce this on java full version "1.4.0_02-20020711" and "1.4.1_01-b01"

Both html attachments cause this error to occur. Though lots of other html files (both with and without matches for the above regex) work fine.

Trussing java while it's hung reveals lots of :

11130/5:	poll(0xF2A7FD88, 0, 10)				= 0
11130/5:	poll(0xF2A7FD88, 0, 10)				= 0
11130/5:	poll(0xF2A7FD88, 0, 10)				= 0
11130/5:	poll(0xF2A7FD88, 0, 10)				= 0
11130/5:	poll(0xF2A7FD88, 0, 10)				= 0
11130/5:	poll(0xF2A7FD88, 0, 10)				= 0
11130/5:	poll(0xF2A7FD88, 0, 10)				= 0
11130/5:	poll(0xF2A7FD88, 0, 10)				= 0
11130/5:	poll(0xF2A7FD88, 0, 10)				= 0
11130/5:	poll(0xF2A7FD88, 0, 10)				= 0
11130/5:	poll(0xF2A7FD88, 0, 10)				= 0
11130/5:	poll(0xF2A7FD88, 0, 10)				= 0
11130/5:	poll(0xF2A7FD88, 0, 10)				= 0
11130/5:	poll(0xF2A7FD88, 0, 10)				= 0
11130/5:	poll(0xF2A7FD88, 0, 10)				= 0
11130/5:	poll(0xF2A7FD88, 0, 10)				= 0
11130/5:	poll(0xF2A7FD88, 0, 10)				= 0
11130/2:	lwp_cond_wait(0x0002BD10, 0x0002BCF8, 0xFADFFD60) (sleeping...)
11130/5:	poll(0xF2A7FD88, 0, 10)				= 0
11130/2:	lwp_cond_wait(0x0002BD10, 0x0002BCF8, 0xFADFFD60) Err#62 ETIME
11130/5:	poll(0xF2A7FD88, 0, 10)				= 0
11130/5:	poll(0xF2A7FD88, 0, 10)				= 0
11130/5:	poll(0xF2A7FD88, 0, 10)				= 0
11130/5:	poll(0xF2A7FD88, 0, 10)				= 0


Comments
WORK AROUND ###@###.### 2002-10-31 The pattern : Pattern mypattern = Pattern.compile ("<(\\s)*"+ "(meta|META)"+ "([^>])+"+ "(CHARSET|charset)="+ "([^>])+>"); works. - see comments
11-06-2004

EVALUATION The construct (\\s|[^>]) causes an exponentially increasing amount of backtracking. The matcher is not hung, but it each character added to the meta tag doubles the time it takes to evaluate the match. ###@###.### 2002-10-31
31-10-2002