Name: nt126004 Date: 11/27/2002 FULL PRODUCT VERSION : java version "1.4.1_01" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_01-b01) Java HotSpot(TM) Client VM (build 1.4.1_01-b01, mixed mode) FULL OPERATING SYSTEM VERSION : Microsoft Windows 2000 [Version 5.00.2195] ADDITIONAL OPERATING SYSTEMS : Linux A DESCRIPTION OF THE PROBLEM : When you use extended REs (m//x) in Perl, the hash symbol introduces a comment that lasts until the end of the line. That doesn't happen if the hash is inside a character class. Although (?x) enables extended REs in Java, the hash symbol is not treated as a literal inside a character class. STEPS TO FOLLOW TO REPRODUCE THE PROBLEM : compile and run code given below EXPECTED VERSUS ACTUAL BEHAVIOR : i would expect no errors from this. however, the second regex is parsed incorrectly, resulting in a PatternSyntaxException. this behavior differs from Perl, which treats # in a character class as a literal character, not a comment character. ERROR MESSAGES/STACK TRACES THAT OCCUR : $ javac example.java $ java example Exception in thread "main" java.util.regex.PatternSyntaxException : Unclosed character class near index 71 (?x)(?i) \b ( (?: D (?:efect)? | B (?:ug)? | Fix\ for) [ #/]* ) (\d+) \b ^ at java.util.regex.Pattern.error(Pattern.java:1489) at java.util.regex.Pattern.clazz(Pattern.java:2002) at java.util.regex.Pattern.sequence(Pattern.java:1546) at java.util.regex.Pattern.expr(Pattern.java:1506) at java.util.regex.Pattern.group0(Pattern.java:2248) at java.util.regex.Pattern.sequence(Pattern.java:1534) at java.util.regex.Pattern.expr(Pattern.java:1506) at java.util.regex.Pattern.compile(Pattern.java:1274) at java.util.regex.Pattern.<init>(Pattern.java:1030) at java.util.regex.Pattern.compile(Pattern.java:777) at java.lang.String.replaceAll(String.java:1710) at example.main(example.java:10) REPRODUCIBILITY : This bug can be reproduced always. ---------- BEGIN SOURCE ---------- // example.java import java.util.regex.*; public class example { public static void main(String[] args) { String line = "this is a fix for defect 1234."; // this works... line = line.replaceAll("(?x)(?i) \\b ( (?: D (?:efect)? | B (?:ug)? | Fix\for) [ \\#/]* ) (\\d+) \\b", "<a href=\"/bugdb.cgi?bug=$2\">$1$2</a>"); // this should, but doesn't... line = line.replaceAll("(?x)(?i) \\b ( (?: D (?:efect)? | B (?:ug)? | Fix\for) [ #/]* ) (\\d+) \\b", "<a href=\"/bugdb.cgi?bug=$2\">$1$2</a>"); } } ---------- END SOURCE ---------- CUSTOMER WORKAROUND : Use "\\#" instead of "#" inside a character class. (Review ID: 166822) ======================================================================
|