United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-4872664 : REGRESSION: regex character class negation error

Details
Type:
Bug
Submit Date:
2003-06-02
Status:
Closed
Updated Date:
2003-12-17
Project Name:
JDK
Resolved Date:
2003-06-27
Component:
core-libs
OS:
linux,windows_xp
Sub-Component:
java.util.regex
CPU:
x86
Priority:
P2
Resolution:
Fixed
Affected Versions:
1.4.2,1.4.2_01
Fixed Versions:
1.4.2_04 (04)

Related Reports
Backport:
Duplicate:
Duplicate:

Sub Tasks

Description

Name: gm110360			Date: 06/02/2003


FULL PRODUCT VERSION :
java version "1.4.2-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-beta-b19)
Java HotSpot(TM) Client VM (build 1.4.2-beta-b19, mixed mode)

FULL OS VERSION :
Windows XP

A DESCRIPTION OF THE PROBLEM :
I wanted to match in a string everything, except '>'. I use the regex "[^>]" But actually it doesn't match the character "\u203A" (The HTML-character ›) as well.

The same applies to '<' and '\u2039', the html &lsaquo; character.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
just run the program below.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
using JRE1.4.1, you get the correct result (last line is important):

C:\> c:\Programme\Java\j2re1.4.1_02\bin\java -classpath classes PatternGTTest
Pattern '>' matches '>'
Pattern '>' does not match '?'
Pattern '[^>]' does not match '>'
Pattern '[^>]' matches '?'
ACTUAL -
using JRE1.4.2-beta, you get an incorrect result (see last line):

C:\> c:\Programme\Java\j2re1.4.2\bin\java -classpath classes PatternGTTest
Pattern '>' matches '>'
Pattern '>' does not match '?'
Pattern '[^>]' does not match '>'
Pattern '[^>]' does not match '?'


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.util.regex.*;

public class PatternGTTest {
    public static void main(String[] args) throws Exception {
        checkMatch(">", ">");
        checkMatch(">", "\u203A");  // &rsaquo;
        checkMatch("[^>]", ">");
        checkMatch("[^>]", "\u203A");
    }
    public static void checkMatch(String pat, String in) {
        System.out.print("Pattern '" + pat + "'");
        Pattern p = Pattern.compile(pat);
        if (!p.matcher(in).matches()) System.out.print(" does not match ");
        else System.out.print(" matches ");
        System.out.println("'" + in + "'");
    }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
use java 1.4.1

Release Regression From : 1.4.1_03
The above release value was the last known release where this 
bug was known to work. Since then there has been a regression.

(Review ID: 186810) 
======================================================================

                                    

Comments
CONVERTED DATA

BugTraq+ Release Management Values

COMMIT TO FIX:
1.4.2_04
tiger

FIXED IN:
1.4.2_04
tiger

INTEGRATED IN:
1.4.2_04
tiger
tiger-b10

VERIFIED IN:
1.4.2_04


                                     
2004-06-14
EVALUATION

Submitter is correct, there is a bug in a character class optimization added in Mantis which affects negated classes containing ASCII characters.
###@###.### 2003-06-02
                                     
2003-06-02



Hardware and Software, Engineered to Work Together