JDK-8258456 : Release Note: Incorrect behavior matching Unicode linebreaks
  • Type: Sub-task
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Affected Version: 15
  • Priority: P4
  • Status: Closed
  • Resolution: Delivered
  • Submitted: 2020-12-16
  • Updated: 2021-12-02
  • Resolved: 2020-12-17
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 15
15Resolved
Description
The change JDK-8235812 in Java 15 introduced incorrect behavior for matching of the `\R` Unicode linebreak sequence when using the `java.util.regex.Pattern` API. The `\R` sequence should match CR (U+000D) or LF (U+000A) individually, but it should not match an individual CR if it occurs in a CRLF sequence. An example of the erroneous behavior is that the pattern `\R{2}` matches a CRLF sequence, but it should not. A possible workaround is to match linebreaks using individual characters instead of `\R`, using negative lookahead to prevent matching of an individual CR within a CRLF sequence. To do this, replace the `\R` sequence with the following:
```
    (?:(\u000D\u000A)|((?!\u000D\u000A)[\000A\u000B\u000C\u000D\u0085\u2028\u2029]))
```
A simpler sequence can be used if matching all of the Unicode-specified linebreak characters is not required, or if special treatment for the CRLF sequence is not required.