CSR :
|
|
Relates :
|
|
Relates :
|
Summary ------- Add two new escape sequences for string and character literals for managing explicit whitespace and carriage control. Problem ------- In text blocks, newlines (`U+000A`) are not typically declared _explicitly_ using `\n`. Instead, newlines are inserted _implicitly_ wherever content breaks to the next line. **What if an implicit newline is _not_ desired?** For example, it is common practice to split very long string literals into concatenations of smaller substrings and then hard-wrap the resulting string literals over multiple lines of source code: String literal = "Lorem ipsum dolor sit amet, consectetur adipiscing " + "elit, sed do eiusmod tempor incididunt ut labore " + "et dolore magna aliqua."; This is exactly the form of complex string that text blocks express more readably: String text = """ Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. """; However, using text blocks to represent long strings has a drawback: an implicit newline is inserted on _every_ line. It would be helpful to be able to selectively denote which lines do not pick up the implicit newline. Turning to another matter, the space (`U+0020`) character's lack of observability creates a problem for strings. For example, text blocks are missing per-line delimiters, like those found in string literals, that clearly indicate where the content of a line begins and where the content of a line ends. The lack of direct space-character observability is the primary reason for text blocks always stripping trailing white space. However, this behavior leads to a counter issue: **How does a developer retain trailing white space in a text block?** For another example, various visual tricks are required to get an accurate count of multiple spaces any string literal. For instance, how many spaces are in the string literal `" "`? **How can a developer count what they can not visually discern?** Solution -------- Change the JLS section on "Escape Sequences for Character and String Literals" and the API `String::translateEscapes` to recognize two new escape sequences: - `\<line-terminator>` The escape sequences `\���` (`U+005C, U+000A`), `\���` (`U+005C, U+000D`) and `\������` (`U+005C, U+000D, U+000A`) represent line continuation. Unlike other escape sequences, these line continuation sequences are simply discarded during escape translation. Example; String text = """ Lorem ipsum dolor sit amet, consectetur adipiscing \ elit, sed do eiusmod tempor incididunt ut labore \ et dolore magna aliqua.\ """; After white space stripping, the above text block would have the value, `"Lorem ipsum dolor sit amet, consectetur adipiscing \���elit, sed do eiusmod tempor incididunt ut labore \���et dolore magna aliqua.\���"`. Applying escape translation would then yield `"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."`. - `\s` (`U+005C, U+0073`) The escape sequence `\s` represents observable space and is translated to the ASCII space character (`U+0020`). String str = "A\sline\swith\sspaces"; After translation the String `str` will have the value `"A line with spaces"`. Specification ------------- JLS changes for the new escape sequences are found in section 3.10.7 of the attachment `text-blocks-jls.html`. There are no JVMS changes. String::translateEscapes diff --- a/src/java.base/share/classes/java/lang/String.java 2019-11-12 13:32:43.000000000 -0400 +++ b/src/java.base/share/classes/java/lang/String.java 2019-11-12 13:32:02.000000000 -0400 @@ -3060,10 +3060,15 @@ * <th scope="row">{@code \u005Cr}</th> * <td>carriage return</td> * <td>{@code U+000D}</td> * </tr> * <tr> + * <th scope="row">{@code \u005Cs}</th> + * <td>space</td> + * <td>{@code U+0020}</td> + * </tr> + * <tr> * <th scope="row">{@code \u005C"}</th> * <td>double quote</td> * <td>{@code U+0022}</td> * </tr> * <tr> @@ -3079,10 +3084,15 @@ * <tr> * <th scope="row">{@code \u005C0 - \u005C377}</th> * <td>octal escape</td> * <td>code point equivalents</td> * </tr> + * <tr> + * <th scope="row">{@code \u005C<line-terminator>}</th> + * <td>continuation</td> + * <td>discard</td> + * </tr> * </tbody> * </table> * * @implNote * This method does <em>not</em> translate Unicode escapes such as "{@code \u005cu2022}". String::translateEscapes after diff changes /** * {@preview Associated with text blocks, a preview feature of * the Java language. * * This method is associated with <i>text blocks</i>, a preview * feature of the Java language. Programs can only use this * method when preview features are enabled. Preview features * may be removed in a future release, or upgraded to permanent * features of the Java language.} * * Returns a string whose value is this string, with escape sequences * translated as if in a string literal. * <p> * Escape sequences are translated as follows; * <table class="striped"> * <caption style="display:none">Translation</caption> * <thead> * <tr> * <th scope="col">Escape</th> * <th scope="col">Name</th> * <th scope="col">Translation</th> * </tr> * </thead> * <tbody> * <tr> * <th scope="row">{@code \u005Cb}</th> * <td>backspace</td> * <td>{@code U+0008}</td> * </tr> * <tr> * <th scope="row">{@code \u005Ct}</th> * <td>horizontal tab</td> * <td>{@code U+0009}</td> * </tr> * <tr> * <th scope="row">{@code \u005Cn}</th> * <td>line feed</td> * <td>{@code U+000A}</td> * </tr> * <tr> * <th scope="row">{@code \u005Cf}</th> * <td>form feed</td> * <td>{@code U+000C}</td> * </tr> * <tr> * <th scope="row">{@code \u005Cr}</th> * <td>carriage return</td> * <td>{@code U+000D}</td> * </tr> * <tr> * <th scope="row">{@code \u005Cs}</th> * <td>space</td> * <td>{@code U+0020}</td> * </tr> * <tr> * <th scope="row">{@code \u005C"}</th> * <td>double quote</td> * <td>{@code U+0022}</td> * </tr> * <tr> * <th scope="row">{@code \u005C'}</th> * <td>single quote</td> * <td>{@code U+0027}</td> * </tr> * <tr> * <th scope="row">{@code \u005C\u005C}</th> * <td>backslash</td> * <td>{@code U+005C}</td> * </tr> * <tr> * <th scope="row">{@code \u005C0 - \u005C377}</th> * <td>octal escape</td> * <td>code point equivalents</td> * </tr> * <tr> * <th scope="row">{@code \u005C<line-terminator>}</th> * <td>continuation</td> * <td>discard</td> * </tr> * </tbody> * </table> * * @implNote * This method does <em>not</em> translate Unicode escapes such as "{@code \u005cu2022}". * Unicode escapes are translated by the Java compiler when reading input characters and * are not part of the string literal specification. * * @throws IllegalArgumentException when an escape sequence is malformed. * * @return String with escape sequences translated. * * @jls 3.10.7 Escape Sequences * * @since 13 */ @jdk.internal.PreviewFeature(feature=jdk.internal.PreviewFeature.Feature.TEXT_BLOCKS, essentialAPI=true) public String translateEscapes() {
|