JDK-8233878 : String::insertEscapes (Preview)
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.lang
  • Priority: P3
  • Status: New
  • Resolution: Unresolved
  • Submitted: 2019-11-09
  • Updated: 2019-11-14
Related Reports
Relates :  
Description
This feature introduces a new String instance method to insert escape sequences, such as \n, \t, \', \", and \\, as described in full in section 3.10.6 of the The Java��� Language Specification.

This is the inverse function to String::translateEscapes (JDK-8223781).

It seems (to some, including this reporter) that adding a method like String::translateEscapes necessarily raises the question of adding its inverse.

Suggested code: http://cr.openjdk.java.net/~jrose/jdk/insertEscapes

Suggestion for further enhancement:  To both methods, add an optional boolean argument (unicodeEscapes) which enables the translation or insertion of the special escape syntax '\uXXXX'.  This makes both methods marginally more useful, for trans-coding between arbitrary strings and 8-bit ASCII sequences.  (The latter are preferentially represented in the JDK, due to compressed strings.)

Suggested code: http://cr.openjdk.java.net/~jrose/jdk/insertEscapes+unicode (naive about interaction of \u and other escapes)

Suggestion #2, in the spirit of finer control over escape processing: Instead of an optional boolean argument, control insertion of escapes with an optional string parameter.  The string simply contains zero or more characters which represent the different kinds of escapes, and only those escapes are processed by the transform:  "bfnrt"+"s\n"+"\\"+"\'\""+"u"+"0".

Special notes:  The convention is clearly flexible to accommodate future escapes.
"0" stands for the octal escape (like <\ 0>; uses chars in [0-7]).
"\n" stands for the continuation syntax <\ NL>.  If present for insertEscapes it will follow all payload <NL> (either escaped or not) by <\ NL>; this pattern may then be searched for by post-processing; it is best used together with "n" so <NL> becomes <\ n \ NL>.

If insertEscapes is lacking the means to escape a character it will be left as-is.  Subsequent regex matches can check for such characters if post-validation is necessary.

For insertEscapes, if \ is *not* present, then \ characters are rendered as <\134> or <\u005c\u005c> (one of u or 0 must be present else error).  This is a useful preprocessing step for regex-based escape processing, since it distinguishes payload \ characters from escape \ characters.