JDK-8176029 : Linebreak matcher is not equivalent to the pattern as stated in javadoc
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Affected Version: 8,9
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2017-02-27
  • Updated: 2017-07-15
  • Resolved: 2017-03-09
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 10 JDK 9
10Fixed 9 b161Fixed
Related Reports
Relates :  
Description
A DESCRIPTION OF THE PROBLEM :
The documentation for Linebreak Matcher \R states that it is equivalent to a specific pattern. However if we substitute that equivalent pattern into a regex, it can give different results.
See the following Stack Overflow question and answer for details--
http://stackoverflow.com/q/42474596/7098259

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
...is equivalent to (?<!\u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029])
ACTUAL -
...is equivalent to \u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]

URL OF FAULTY DOCUMENTATION :
https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html


Comments
api doc is NOT wrong, the implementation is wrong (which fails to backtracking "0x0d+next.match()" when "0x0d+0x0a + next.match()" fails) @@ -3865,12 +3865,14 @@ if (ch == 0x0A || ch == 0x0B || ch == 0x0C || ch == 0x85 || ch == 0x2028 || ch == 0x2029) return next.match(matcher, i + 1, seq); if (ch == 0x0D) { i++; - if (i < matcher.to && seq.charAt(i) == 0x0A) - i++; + if (i < matcher.to && seq.charAt(i) == 0x0A && + next.match(matcher, i + 1, seq)) { + return true; + } return next.match(matcher, i, seq); } } else { matcher.hitEnd = true; }
02-03-2017