JDK-4872664 : REGRESSION: regex character class negation error
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Affected Version: 1.4.2,1.4.2_01
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: linux,windows_xp
  • CPU: x86
  • Submitted: 2003-06-02
  • Updated: 2003-12-17
  • Resolved: 2003-06-27
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
1.4.2_04 04Fixed
Related Reports
Duplicate :  
Duplicate :  
Description

Name: gm110360			Date: 06/02/2003


FULL PRODUCT VERSION :
java version "1.4.2-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-beta-b19)
Java HotSpot(TM) Client VM (build 1.4.2-beta-b19, mixed mode)

FULL OS VERSION :
Windows XP

A DESCRIPTION OF THE PROBLEM :
I wanted to match in a string everything, except '>'. I use the regex "[^>]" But actually it doesn't match the character "\u203A" (The HTML-character ›) as well.

The same applies to '<' and '\u2039', the html &lsaquo; character.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
just run the program below.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
using JRE1.4.1, you get the correct result (last line is important):

C:\> c:\Programme\Java\j2re1.4.1_02\bin\java -classpath classes PatternGTTest
Pattern '>' matches '>'
Pattern '>' does not match '?'
Pattern '[^>]' does not match '>'
Pattern '[^>]' matches '?'
ACTUAL -
using JRE1.4.2-beta, you get an incorrect result (see last line):

C:\> c:\Programme\Java\j2re1.4.2\bin\java -classpath classes PatternGTTest
Pattern '>' matches '>'
Pattern '>' does not match '?'
Pattern '[^>]' does not match '>'
Pattern '[^>]' does not match '?'


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.util.regex.*;

public class PatternGTTest {
    public static void main(String[] args) throws Exception {
        checkMatch(">", ">");
        checkMatch(">", "\u203A");  // &rsaquo;
        checkMatch("[^>]", ">");
        checkMatch("[^>]", "\u203A");
    }
    public static void checkMatch(String pat, String in) {
        System.out.print("Pattern '" + pat + "'");
        Pattern p = Pattern.compile(pat);
        if (!p.matcher(in).matches()) System.out.print(" does not match ");
        else System.out.print(" matches ");
        System.out.println("'" + in + "'");
    }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
use java 1.4.1

Release Regression From : 1.4.1_03
The above release value was the last known release where this 
bug was known to work. Since then there has been a regression.

(Review ID: 186810) 
======================================================================

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: 1.4.2_04 tiger FIXED IN: 1.4.2_04 tiger INTEGRATED IN: 1.4.2_04 tiger tiger-b10 VERIFIED IN: 1.4.2_04
2004-06-14

EVALUATION Submitter is correct, there is a bug in a character class optimization added in Mantis which affects negated classes containing ASCII characters. ###@###.### 2003-06-02
2003-06-02