CSR :
|
|
Relates :
|
|
Relates :
|
Summary ------- Enhance the Java language by introducing _raw string literals_, a more flexible way to represent strings than traditional string literals. Problem ------- Java's traditional _string literals_ ([JLS 3.10.5](https://docs.oracle.com/javase/specs/jls/se10/html/jls-3.html#jls-3.10.5)) allow various special characters to be represented with _escape sequences_ ([JLS 3.10.6](https://docs.oracle.com/javase/specs/jls/se10/html/jls-3.html#jls-3.10.6)), such as `\"` for a double-quote character and `\n` for a linefeed character. The use of escape sequences makes string literals hard to read and more likely to accidentally rely on OS-specific conventions (for example, `\n` is the newline character on Unix, but not Windows). In addition, the use of backslash `\` to introduce an escape sequence means that a string literal which truly wishes to include a backslash must escape it, via `\\`. This doubling-up of backslashes makes it painful to denote file paths and regular expressions. Finally, string literals are subject to Unicode escape processing ([JLS 3.3](https://docs.oracle.com/javase/specs/jls/se10/html/jls-3.html#jls-3.3)), where each `\uXXXX` character sequence is interpreted as a Unicode code point; this processing is convenient for representing, say, non-ASCII variable names, but inconvenient when embedding fragments of other Java programs. Broadly speaking, Java code that embeds fragments of other programs (whether Java, or SQL, or JSON, etc) needs a mechanism for capturing literal strings as-is, without special handling of newlines, backslashes, or Unicode escapes. Solution ------- A _raw string literal_ is a backtick-delimited literal that (i) opts out of Unicode escape processing, (ii) ignores Java escape sequences, and (iii) normalizes each embedded newline (as determined by the compiler's source encoding) to a JLS-defined, OS-independent representation. Multiple balanced backticks can be used to delimit a raw string literal that contains embedded backticks, without changing the payload string at all. The following are examples of raw string literals: ``` `"` // a string containing a single double-quote character ``can`t`` // a string containing the five characters 'c', 'a', 'n', '`' and 't' `This is a string` // a string containing 16 characters `\n` // a string containing '\' and 'n' `\u2022` // a string containing '\', 'u', '2', '0', '2' and '2' `This is a two-line string` // a string with an embedded newline ``` Specification ------- Proposed changes to the Java Language Specification are attached. Because the type of a raw string literal is `String`, it is acceptable to use a raw string literal anywhere that a traditional string literal could be used, and vice versa. There are no changes to the JVM Specification. A string in the constant pool of a `class` file ([JVMS 4.4.3](https://docs.oracle.com/javase/specs/jvms/se10/html/jvms-4.html#jvms-4.4.3)) has always been independent of Java language rules for traditional string literals, so it is a suitable compilation target for raw string literals. A `class` file does not record whether a string in the constant pool was compiled from a traditional string literal or a raw string literal.
|