|
CSR :
|
|
|
Relates :
|
|
|
Relates :
|
Summary
-------
Add two new escape sequences for string and character literals for managing
explicit whitespace and carriage control.
Problem
-------
In text blocks, newlines (`U+000A`) are not typically declared _explicitly_ using `\n`. Instead, newlines are inserted _implicitly_ wherever content breaks to the next line. **What if an implicit newline is _not_ desired?**
For example, it is common practice to split very long string literals into concatenations of smaller substrings and then hard-wrap the resulting string literals over multiple lines of source code:
String literal = "Lorem ipsum dolor sit amet, consectetur adipiscing " +
"elit, sed do eiusmod tempor incididunt ut labore " +
"et dolore magna aliqua.";
This is exactly the form of complex string that text blocks express more readably:
String text = """
Lorem ipsum dolor sit amet, consectetur adipiscing
elit, sed do eiusmod tempor incididunt ut labore
et dolore magna aliqua.
""";
However, using text blocks to represent long strings has a drawback: an implicit newline is inserted on _every_ line. It would be helpful to be able to selectively denote which lines do not pick up the implicit newline.
Turning to another matter, the space (`U+0020`) character's lack of observability creates a problem for strings.
For example, text blocks are missing per-line delimiters, like those found in string literals, that clearly indicate where the content of a line begins and where the content of a line ends. The lack of direct space-character observability is the primary reason for text blocks always stripping trailing white space. However, this behavior leads to a counter issue: **How does a developer retain trailing white space in a text block?**
For another example, various visual tricks are required to get an accurate count of multiple spaces any string literal. For instance, how many spaces are in the string literal `" "`? **How can a developer count what they can not visually discern?**
Solution
--------
Change the JLS section on "Escape Sequences for Character and String Literals" and the API `String::translateEscapes` to recognize two new escape sequences:
- `\<line-terminator>`
The escape sequences `\���` (`U+005C, U+000A`), `\���` (`U+005C, U+000D`) and `\������` (`U+005C, U+000D, U+000A`) represent line continuation. Unlike other escape sequences, these line continuation sequences are simply discarded during escape translation.
Example;
String text = """
Lorem ipsum dolor sit amet, consectetur adipiscing \
elit, sed do eiusmod tempor incididunt ut labore \
et dolore magna aliqua.\
""";
After white space stripping, the above text block would have the value,
`"Lorem ipsum dolor sit amet, consectetur adipiscing \���elit, sed do eiusmod tempor incididunt ut labore \���et dolore magna aliqua.\���"`.
Applying escape translation would then yield
`"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."`.
- `\s` (`U+005C, U+0073`)
The escape sequence `\s` represents observable space and is translated to the ASCII space character (`U+0020`).
String str = "A\sline\swith\sspaces";
After translation the String `str` will have the value `"A line with spaces"`.
Specification
-------------
JLS changes for the new escape sequences are found in section 3.10.7 of the attachment `text-blocks-jls.html`. There are no JVMS changes.
String::translateEscapes diff
--- a/src/java.base/share/classes/java/lang/String.java 2019-11-12 13:32:43.000000000 -0400
+++ b/src/java.base/share/classes/java/lang/String.java 2019-11-12 13:32:02.000000000 -0400
@@ -3060,10 +3060,15 @@
* <th scope="row">{@code \u005Cr}</th>
* <td>carriage return</td>
* <td>{@code U+000D}</td>
* </tr>
* <tr>
+ * <th scope="row">{@code \u005Cs}</th>
+ * <td>space</td>
+ * <td>{@code U+0020}</td>
+ * </tr>
+ * <tr>
* <th scope="row">{@code \u005C"}</th>
* <td>double quote</td>
* <td>{@code U+0022}</td>
* </tr>
* <tr>
@@ -3079,10 +3084,15 @@
* <tr>
* <th scope="row">{@code \u005C0 - \u005C377}</th>
* <td>octal escape</td>
* <td>code point equivalents</td>
* </tr>
+ * <tr>
+ * <th scope="row">{@code \u005C<line-terminator>}</th>
+ * <td>continuation</td>
+ * <td>discard</td>
+ * </tr>
* </tbody>
* </table>
*
* @implNote
* This method does <em>not</em> translate Unicode escapes such as "{@code \u005cu2022}".
String::translateEscapes after diff changes
/**
* {@preview Associated with text blocks, a preview feature of
* the Java language.
*
* This method is associated with <i>text blocks</i>, a preview
* feature of the Java language. Programs can only use this
* method when preview features are enabled. Preview features
* may be removed in a future release, or upgraded to permanent
* features of the Java language.}
*
* Returns a string whose value is this string, with escape sequences
* translated as if in a string literal.
* <p>
* Escape sequences are translated as follows;
* <table class="striped">
* <caption style="display:none">Translation</caption>
* <thead>
* <tr>
* <th scope="col">Escape</th>
* <th scope="col">Name</th>
* <th scope="col">Translation</th>
* </tr>
* </thead>
* <tbody>
* <tr>
* <th scope="row">{@code \u005Cb}</th>
* <td>backspace</td>
* <td>{@code U+0008}</td>
* </tr>
* <tr>
* <th scope="row">{@code \u005Ct}</th>
* <td>horizontal tab</td>
* <td>{@code U+0009}</td>
* </tr>
* <tr>
* <th scope="row">{@code \u005Cn}</th>
* <td>line feed</td>
* <td>{@code U+000A}</td>
* </tr>
* <tr>
* <th scope="row">{@code \u005Cf}</th>
* <td>form feed</td>
* <td>{@code U+000C}</td>
* </tr>
* <tr>
* <th scope="row">{@code \u005Cr}</th>
* <td>carriage return</td>
* <td>{@code U+000D}</td>
* </tr>
* <tr>
* <th scope="row">{@code \u005Cs}</th>
* <td>space</td>
* <td>{@code U+0020}</td>
* </tr>
* <tr>
* <th scope="row">{@code \u005C"}</th>
* <td>double quote</td>
* <td>{@code U+0022}</td>
* </tr>
* <tr>
* <th scope="row">{@code \u005C'}</th>
* <td>single quote</td>
* <td>{@code U+0027}</td>
* </tr>
* <tr>
* <th scope="row">{@code \u005C\u005C}</th>
* <td>backslash</td>
* <td>{@code U+005C}</td>
* </tr>
* <tr>
* <th scope="row">{@code \u005C0 - \u005C377}</th>
* <td>octal escape</td>
* <td>code point equivalents</td>
* </tr>
* <tr>
* <th scope="row">{@code \u005C<line-terminator>}</th>
* <td>continuation</td>
* <td>discard</td>
* </tr>
* </tbody>
* </table>
*
* @implNote
* This method does <em>not</em> translate Unicode escapes such as "{@code \u005cu2022}".
* Unicode escapes are translated by the Java compiler when reading input characters and
* are not part of the string literal specification.
*
* @throws IllegalArgumentException when an escape sequence is malformed.
*
* @return String with escape sequences translated.
*
* @jls 3.10.7 Escape Sequences
*
* @since 13
*/
@jdk.internal.PreviewFeature(feature=jdk.internal.PreviewFeature.Feature.TEXT_BLOCKS,
essentialAPI=true)
public String translateEscapes() {
|