United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-4785712 : The '#' in regex character class is treated as comment

Details
Type:
Bug
Submit Date:
2002-11-27
Status:
Closed
Updated Date:
2006-06-15
Project Name:
JDK
Resolved Date:
2006-06-15
Component:
core-libs
OS:
windows_2000
Sub-Component:
java.util.regex
CPU:
x86
Priority:
P4
Resolution:
Won't Fix
Affected Versions:
1.4.1
Fixed Versions:

Related Reports

Sub Tasks

Description
Name: nt126004			Date: 11/27/2002


FULL PRODUCT VERSION :
java version "1.4.1_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_01-b01)
Java HotSpot(TM) Client VM (build 1.4.1_01-b01, mixed mode)


FULL OPERATING SYSTEM VERSION :
Microsoft Windows 2000 [Version 5.00.2195]

ADDITIONAL OPERATING SYSTEMS :
Linux


A DESCRIPTION OF THE PROBLEM :
When you use extended REs (m//x) in Perl, the hash symbol
introduces a comment that lasts until the end of the line.
That doesn't happen if the hash is inside a character class.

Although (?x) enables extended REs in Java, the hash symbol
is not treated as a literal inside a character class.


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
compile and run code given below


EXPECTED VERSUS ACTUAL BEHAVIOR :
i would expect no errors from this. however, the second
regex is parsed incorrectly, resulting in a
PatternSyntaxException. this behavior differs from Perl,
which treats # in a character class as a literal character,
not a comment character.


ERROR MESSAGES/STACK TRACES THAT OCCUR :
 $ javac example.java
 $ java example
Exception in thread "main"
java.util.regex.PatternSyntaxException : Unclosed character
class near index 71
 (?x)(?i) \b ( (?: D (?:efect)? | B (?:ug)? | Fix\ for) [ #/]* ) (\d+) \b
                                                             ^
        at java.util.regex.Pattern.error(Pattern.java:1489)
        at java.util.regex.Pattern.clazz(Pattern.java:2002)
        at java.util.regex.Pattern.sequence(Pattern.java:1546)
        at java.util.regex.Pattern.expr(Pattern.java:1506)
        at java.util.regex.Pattern.group0(Pattern.java:2248)
        at java.util.regex.Pattern.sequence(Pattern.java:1534)
        at java.util.regex.Pattern.expr(Pattern.java:1506)
        at java.util.regex.Pattern.compile(Pattern.java:1274)
        at java.util.regex.Pattern.<init>(Pattern.java:1030)
        at java.util.regex.Pattern.compile(Pattern.java:777)
        at java.lang.String.replaceAll(String.java:1710)
        at example.main(example.java:10)

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
// example.java
import java.util.regex.*;
public class example {
 public static void main(String[] args) {
  String line = "this is a fix for defect 1234.";

  // this works...
  line = line.replaceAll("(?x)(?i) \\b ( (?: D (?:efect)? | B (?:ug)? | Fix\for)  [ \\#/]* ) (\\d+) \\b", "<a href=\"/bugdb.cgi?bug=$2\">$1$2</a>");

  // this should, but doesn't...
  line = line.replaceAll("(?x)(?i) \\b ( (?: D (?:efect)? | B (?:ug)? | Fix\for)  [ #/]* ) (\\d+) \\b", "<a href=\"/bugdb.cgi?bug=$2\">$1$2</a>");
 }
}
---------- END SOURCE ----------

CUSTOMER WORKAROUND :
 Use "\\#" instead of "#" inside a character class.
(Review ID: 166822) 
======================================================================

                                    

Comments
EVALUATION

It is true that we handle this differently than in Perl but it is not a critical issue. This is only a problem when using the extended comments flag (?x) and when a hash appears inside a character class, and it is easy to workaround.
###@###.### 2002-12-02
                                     
2002-12-02
EVALUATION

Not compelling enough to change the current behavior to match what
Perl does after two major releases, the compatibility weighs more
here. Closed as "will not fix".
                                     
2006-06-15



Hardware and Software, Engineered to Work Together