FULL PRODUCT VERSION : java version "1.7.0-ea" Java(TM) SE Runtime Environment (build 1.7.0-ea-b06) Java HotSpot(TM) Client VM (build 1.7.0-ea-b06, mixed mode, sharing) ADDITIONAL OS VERSION INFORMATION : Linux helium 2.6.17-10-generic #2 SMP Tue Dec 5 22:28:26 UTC 2006 i686 GNU/Linux A DESCRIPTION OF THE PROBLEM : Pattern.compile("$").matcher("a\nb\nc\n") matches twice instead of once. http://elliotth.blogspot.com/2007/01/what-do-anchors-and-mean-in-regular.html the first match is the final line terminator. the second match is the end-of-input. in MULTILINE mode this is unfortunate (because it's not Perl-compatible and should be listed in the incompatibilities with Perl 5 in the documentation), but it's understandable because of the "or" in the definition of what MULTILINE causes $ to match. but in non-MULTILINE mode, this is incorrect (in that i don't see how it's specified by the documentation). STEPS TO FOLLOW TO REPRODUCE THE PROBLEM : run the supplied test case. REPRODUCIBILITY : This bug can be reproduced always. ---------- BEGIN SOURCE ---------- import java.util.regex.*; public class test { public static void main(String[] args) { Pattern p = Pattern.compile("$"); Matcher m = p.matcher("a\nb\nc\nhello\nworld\n"); int count = 0; while (m.find()) { ++count; } System.err.println(count); } } ---------- END SOURCE ---------- CUSTOMER SUBMITTED WORKAROUND : i would have suggested using \Z, but that's broken too ;-) Copied from http://bugs.openjdk.java.net/show_bug.cgi?id=100084#c0 Description From ###@###.### 2009-07-09 01:40:09 PDT Created an attachment (id=99) [details] contains the exported diff and a jtreg testcase sunbug=6520207 Pattern.compile("$").matcher("a\nb\nc\n") matches twice instead of once. -------------------------------------------- Adding a simple check in the Pattern$Dollar class to avoid matching without any content.
|