JDK-8206982 : Compiler support for Raw String Literals (Preview)
  • Type: CSR
  • Component: tools
  • Sub-Component: javac
  • Priority: P3
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 12
  • Submitted: 2018-07-10
  • Updated: 2019-01-03
  • Resolved: 2018-08-09
Related Reports
CSR :  
Relates :  
Relates :  
Description
Summary
-------

Enhance the Java language by introducing _raw string literals_, a more flexible way to represent strings than traditional string literals.

Problem
-------

Java's traditional _string literals_ ([JLS 3.10.5](https://docs.oracle.com/javase/specs/jls/se10/html/jls-3.html#jls-3.10.5)) allow various special characters to be represented with _escape sequences_ ([JLS 3.10.6](https://docs.oracle.com/javase/specs/jls/se10/html/jls-3.html#jls-3.10.6)), such as `\"` for a double-quote character and `\n` for a linefeed character. The use of escape sequences makes string literals hard to read and more likely to accidentally rely on OS-specific conventions (for example, `\n` is the newline character on Unix, but not Windows). In addition, the use of backslash `\` to introduce an escape sequence means that a string literal which truly wishes to include a backslash must escape it, via `\\`. This doubling-up of backslashes makes it painful to denote file paths and regular expressions. Finally, string literals are subject to Unicode escape processing ([JLS 3.3](https://docs.oracle.com/javase/specs/jls/se10/html/jls-3.html#jls-3.3)), where each `\uXXXX` character sequence is interpreted as a Unicode code point; this processing is convenient for representing, say, non-ASCII variable names, but inconvenient when embedding fragments of other Java programs. Broadly speaking, Java code that embeds fragments of other programs (whether Java, or SQL, or JSON, etc) needs a mechanism for capturing literal strings as-is, without special handling of newlines, backslashes, or Unicode escapes.

Solution
-------

A _raw string literal_ is a backtick-delimited literal that (i) opts out of Unicode escape processing, (ii) ignores Java escape sequences, and (iii) normalizes each embedded newline (as determined by the compiler's source encoding) to a JLS-defined, OS-independent representation. Multiple balanced backticks can be used to delimit a raw string literal that contains embedded backticks, without changing the payload string at all.

The following are examples of raw string literals:

```
`"`                // a string containing a single double-quote character
``can`t``          // a string containing the five characters 'c', 'a', 'n', '`' and 't'
`This is a string` // a string containing 16 characters
`\n`               // a string containing '\' and 'n'
`\u2022`           // a string containing '\', 'u', '2', '0', '2' and '2'
`This is a
two-line string`   // a string with an embedded newline
```

Specification
-------

Proposed changes to the Java Language Specification are attached. Because the type of a raw string literal is `String`, it is acceptable to use a raw string literal anywhere that a traditional string literal could be used, and vice versa.

There are no changes to the JVM Specification. A string in the constant pool of a `class` file ([JVMS 4.4.3](https://docs.oracle.com/javase/specs/jvms/se10/html/jvms-4.html#jvms-4.4.3)) has always been independent of Java language rules for traditional string literals, so it is a suitable compilation target for raw string literals. A `class` file does not record whether a string in the constant pool was compiled from a traditional string literal or a raw string literal.
Comments
It might be helpful to readers of the new JLS sections to have a discussion of the (lack of) processing of heading whitespace on a line and an informative reference to the APIs to deal with that situation. Moving to Approved.
09-08-2018

Moving to Provisional.
18-07-2018