Bug ID: JDK-8233117 Escape Sequences For Line Continuation and White Space (Preview)

JDK-8233117 : Escape Sequences For Line Continuation and White Space (Preview)

Type: CSR
Component: core-libs
Sub-Component: java.lang

Priority: P3
Status: Closed
Resolution: Approved
Fix Versions: 14

Submitted: 2019-10-29
Updated: 2020-05-01
Resolved: 2019-11-27

Related Reports

CSR :	JDK-8233116 - Escape Sequences For Line Continuation and White Space (Preview)
Relates :	JDK-8231623 - JEP 368: Text Blocks (Second Preview)
Relates :	JDK-8235616 - JLS changes for Text Blocks (Second Preview)

Description

Summary
-------

Add two new escape sequences for string and character literals for managing
explicit whitespace and carriage control.

Problem
-------

In text blocks, newlines (`U+000A`) are not typically declared _explicitly_ using `\n`. Instead, newlines are inserted _implicitly_ wherever content breaks to the next line. **What if an implicit newline is _not_ desired?**

For example, it is common practice to split very long string literals into concatenations of smaller substrings and then hard-wrap the resulting string literals over multiple lines of source code:

      String literal = "Lorem ipsum dolor sit amet, consectetur adipiscing " +
                       "elit, sed do eiusmod tempor incididunt ut labore " +
                       "et dolore magna aliqua.";

This is exactly the form of complex string that text blocks express more readably:

      String text = """
                    Lorem ipsum dolor sit amet, consectetur adipiscing
                    elit, sed do eiusmod tempor incididunt ut labore
                    et dolore magna aliqua.
                    """;

However, using text blocks to represent long strings has a drawback: an implicit newline is inserted on _every_ line. It would be helpful to be able to selectively denote which lines do not pick up the implicit newline.

Turning to another matter, the space (`U+0020`) character's lack of observability creates a problem for strings.

For example, text blocks are missing per-line delimiters, like those found in string literals, that clearly indicate where the content of a line begins and where the content of a line ends. The lack of direct space-character observability is the primary reason for text blocks always stripping trailing white space. However, this behavior leads to a counter issue: **How does a developer retain trailing white space in a text block?**

For another example, various visual tricks are required to get an accurate count of multiple spaces any string literal. For instance, how many spaces are in the string literal `"     "`? **How can a developer count what they can not visually discern?**

Solution
--------

Change the JLS section on "Escape Sequences for Character and String Literals" and the API `String::translateEscapes` to recognize two new escape sequences:

- `\<line-terminator>`

    The escape sequences `\���` (`U+005C, U+000A`), `\���` (`U+005C, U+000D`) and `\������` (`U+005C, U+000D, U+000A`) represent line continuation. Unlike other escape sequences, these line continuation sequences are simply discarded during escape translation.

    Example;

        String text = """
                    Lorem ipsum dolor sit amet, consectetur adipiscing \
                    elit, sed do eiusmod tempor incididunt ut labore \
                    et dolore magna aliqua.\
                    """;

    After white space stripping, the above text block would have the value,
    `"Lorem ipsum dolor sit amet, consectetur adipiscing \���elit, sed do eiusmod tempor incididunt ut labore \���et dolore magna aliqua.\���"`.
    Applying escape translation would then yield
    `"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."`.
    
- `\s` (`U+005C, U+0073`)

    The escape sequence `\s`  represents observable space and is translated to the ASCII space character (`U+0020`).

        String str = "A\sline\swith\sspaces";

    After translation the String `str` will have the value `"A line with spaces"`.

Specification
-------------

JLS changes for the new escape sequences are found in section 3.10.7 of the attachment `text-blocks-jls.html`. There are no JVMS changes.


String::translateEscapes diff

    --- a/src/java.base/share/classes/java/lang/String.java	2019-11-12 13:32:43.000000000 -0400
    +++ b/src/java.base/share/classes/java/lang/String.java	2019-11-12 13:32:02.000000000 -0400
    @@ -3060,10 +3060,15 @@
          *     <th scope="row">{@code \u005Cr}</th>
          *     <td>carriage return</td>
          *     <td>{@code U+000D}</td>
          *   </tr>
          *   <tr>
    +     *     <th scope="row">{@code \u005Cs}</th>
    +     *     <td>space</td>
    +     *     <td>{@code U+0020}</td>
    +     *   </tr>
    +     *   <tr>
          *     <th scope="row">{@code \u005C"}</th>
          *     <td>double quote</td>
          *     <td>{@code U+0022}</td>
          *   </tr>
          *   <tr>
    @@ -3079,10 +3084,15 @@
          *   <tr>
          *     <th scope="row">{@code \u005C0 - \u005C377}</th>
          *     <td>octal escape</td>
          *     <td>code point equivalents</td>
          *   </tr>
    +     *   <tr>
    +     *     <th scope="row">{@code \u005C<line-terminator>}</th>
    +     *     <td>continuation</td>
    +     *     <td>discard</td>
    +     *   </tr>
          *   </tbody>
          * </table>
          *
          * @implNote
          * This method does <em>not</em> translate Unicode escapes such as "{@code \u005cu2022}".

String::translateEscapes after diff changes

    /**
     * {@preview Associated with text blocks, a preview feature of
     *           the Java language.
     *
     *           This method is associated with <i>text blocks</i>, a preview
     *           feature of the Java language. Programs can only use this
     *           method when preview features are enabled. Preview features
     *           may be removed in a future release, or upgraded to permanent
     *           features of the Java language.}
     *
     * Returns a string whose value is this string, with escape sequences
     * translated as if in a string literal.
     * <p>
     * Escape sequences are translated as follows;
     * <table class="striped">
     *   <caption style="display:none">Translation</caption>
     *   <thead>
     *   <tr>
     *     <th scope="col">Escape</th>
     *     <th scope="col">Name</th>
     *     <th scope="col">Translation</th>
     *   </tr>
     *   </thead>
     *   <tbody>
     *   <tr>
     *     <th scope="row">{@code \u005Cb}</th>
     *     <td>backspace</td>
     *     <td>{@code U+0008}</td>
     *   </tr>
     *   <tr>
     *     <th scope="row">{@code \u005Ct}</th>
     *     <td>horizontal tab</td>
     *     <td>{@code U+0009}</td>
     *   </tr>
     *   <tr>
     *     <th scope="row">{@code \u005Cn}</th>
     *     <td>line feed</td>
     *     <td>{@code U+000A}</td>
     *   </tr>
     *   <tr>
     *     <th scope="row">{@code \u005Cf}</th>
     *     <td>form feed</td>
     *     <td>{@code U+000C}</td>
     *   </tr>
     *   <tr>
     *     <th scope="row">{@code \u005Cr}</th>
     *     <td>carriage return</td>
     *     <td>{@code U+000D}</td>
     *   </tr>
     *   <tr>
     *     <th scope="row">{@code \u005Cs}</th>
     *     <td>space</td>
     *     <td>{@code U+0020}</td>
     *   </tr>
     *   <tr>
     *     <th scope="row">{@code \u005C"}</th>
     *     <td>double quote</td>
     *     <td>{@code U+0022}</td>
     *   </tr>
     *   <tr>
     *     <th scope="row">{@code \u005C'}</th>
     *     <td>single quote</td>
     *     <td>{@code U+0027}</td>
     *   </tr>
     *   <tr>
     *     <th scope="row">{@code \u005C\u005C}</th>
     *     <td>backslash</td>
     *     <td>{@code U+005C}</td>
     *   </tr>
     *   <tr>
     *     <th scope="row">{@code \u005C0 - \u005C377}</th>
     *     <td>octal escape</td>
     *     <td>code point equivalents</td>
     *   </tr>
     *   <tr>
     *     <th scope="row">{@code \u005C<line-terminator>}</th>
     *     <td>continuation</td>
     *     <td>discard</td>
     *   </tr>
     *   </tbody>
     * </table>
     *
     * @implNote
     * This method does <em>not</em> translate Unicode escapes such as "{@code \u005cu2022}".
     * Unicode escapes are translated by the Java compiler when reading input characters and
     * are not part of the string literal specification.
     *
     * @throws IllegalArgumentException when an escape sequence is malformed.
     *
     * @return String with escape sequences translated.
     *
     * @jls 3.10.7 Escape Sequences
     *
     * @since 13
     */
    @jdk.internal.PreviewFeature(feature=jdk.internal.PreviewFeature.Feature.TEXT_BLOCKS,
                                 essentialAPI=true)
    public String translateEscapes() {

Comments

The discussion sort of began around https://mail.openjdk.java.net/pipermail/amber-spec-experts/2019-May/001372.html
27-11-2019
Moving to Approved. Is there a discussion thread for the rationale of using "s" for the space escape character? My initial choice would have been "_", but individual tastes and other factors will differ.
27-11-2019
[~darcy] The text I quoted from JEP 368 was merely for background, to show what is driving the JLS draft. The JLS draft itself, as attached, and as you indicated, allows \s not only in text blocks but also in character literals and string literals (namely, in the retitled 3.10.7).
26-11-2019
CSR are standalone documents, capturing the state of work before it is pushed. JEPs are living documents that can be and are edited after initial versions of the work are pushed. The existence of a JEP does not excuse the corresponding CSR from being complete and accurate. Please Finalize the this request when it is ready for the second phase of review.
26-11-2019
[~darcy] Re: "\s" can be used as a new escape sequence for space characters anywhere, not just in text blocks. Is that the intended change? -- Yes, the JLS draft reflects the design goal from JEP 368: The \s escape sequence can be used in both text blocks and traditional string literals.
25-11-2019
By my reading of the attached JLS spec, "\s" can be used as a new escape sequence for space characters anywhere, not just in text blocks. Is that the intended change? Let me make the observation/question more explicit: under the proposed change if the feature is enabled the set of escape sequences is augmented multiple contexts and not just text blocks. Moving to Provisional (not Approved).
23-11-2019
[~darcy] Unfortunately, for production reasons, I can't offer a diff between (a) the text-block-only JLS changes for JEP 355, and (b) the text-block-plus-two-escape-sequences JLS changes for JEP 368. (That is, a diff of diffs.) That said, we do anticipate the question: "What's new in Java 14 _for text blocks_, relative to Java 13", and JEP 368 answers it by omission: "Feedback on JDK 13 suggests that this feature should be previewed again, with the addition of two new escape sequences." i.e. text blocks themselves are unchanged between Java 13 and 14, save for \s and \LineTerminator.
12-11-2019
Diff provided.
12-11-2019
[~abuckley], thank you for the explanation. With various kinds of reviews, it is often helpful to see both a diff of the work against the current baseline as well as diff of the work against the previous version of the work. The language changes are another instance where seeing both kinds of review is helpful. In particular, to see what I would expected to be small changes of text blocks with two new escape sequences vs the older text blocks proposal in JDK 13. [~jlaskey], the translateEscapes method is already in JDK 13 and 14; please show a diff of some sort to highlight the proposed change.
12-11-2019
JEP 355 (Text Blocks) introduced language changes specified as an add-on to JLS 13, per https://docs.oracle.com/javase/specs/jls/se13/html/jls-1.html#jls-1.5 ... as we then go forward into the Java SE 14 era, JLS 14 doesn't automatically include those changes (either as an add-on spec or as final-and-permanent text in the mainline chapter 3) because an explicit decision needs to be taken about the fate of the text blocks feature. JEP 368 (Text Blocks (Second Preview)) represents the explicit decision. It introduces language changes (slightly enlarged to include the escape sequence sub-feature) which will be specified as an add-on to JLS 14. So, the base document for the language changes attached to this CSR is the (notional) JLS 14 with zero add-ons from the Java SE 13 era.
12-11-2019
What is the base document for the text blocks spec? Some of the text marked as new in the attachment seems to be existing text from the text blocks as described as preview feature in 13: https://docs.oracle.com/javase/specs/jls/se13/preview/text-blocks.html If there are Java SE API changes to String.translateEscapes or elsewhere, as implied by the interface kind including Java API, that needs to be explicitly included in this request.
09-11-2019