JDK-8200435 : String::align, String::indent
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.lang
  • Priority: P3
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 12
  • Submitted: 2018-03-29
  • Updated: 2019-01-09
  • Resolved: 2018-09-12
Related Reports
CSR :  
Relates :  
Relates :  
Description
Summary
-------

This feature introduces three new String instance methods used to manage
incidental indentation introduced by code aligning a Raw String Literal
in source.  These methods may also be used to manage incidental
indentation from strings derived from external sources (ex., character
files.)

Problem
-------

With the introduction of
[JEP 326: Raw String Literals](http://openjdk.java.net/jeps/326),
many developers will make regular use of multi-line strings.
Unfortunately, Raw String Literals are inherently undiscerning of
formatting conventions used in code surrounding the literal.  This may
lead to the introduction of leading white space (indentation) not
intended to be included in the content of the string.

In the following example the HTML Raw String Literal is indented
to align with assignment expression. To do so, extra white space is
added to each line (incidental white space denoted with periods).

```
Ex 1.
    String html = `
..................<html>
..................   <body>
..................       <p>Hello World</p>
..................   </body>
..................</html>
..................`;
```

However, when the string is used or displayed, that extra white space
may be undesired.

Solution
--------

There are a few incidental indentation removal solutions currently available.

```
Ex 2.
    // Remove a fixed number of spaces from each line.
    int indent = 18;
    String trimmed = html.lines()
                         .map(s -> s.substring(indent))
                         .collect(joining("\n", "", "\n"));

Ex 3.
    // Remove all leading (and trailing) spaces from each line.
    String trimmed = html.lines()
                         .map(String::strip)
                         .collect(joining("\n", "", "\n"));
```

However, in many cases the developer would like to keep some spacing to
maintain relative indentation without having to count out how many
spaces to be removed. This proposal provides two solutions;
`String::align()` and `String::indent(int n)`.

`String::align()` after removing all leading and trailing blank lines,
left justifies each line without loss of relative indentation. Thus,
stripping away all incidental indentation and line spacing.

```
Ex 4.
    // The manual and aligned are equivalant strings
    String manual =
`<html>
    <body>
        <p>Hello World</p>
    </body>
</html>
`;
    String aligned = `
.....................<html>
.....................    <body>
.....................        <p>Hello World</p>
.....................    </body>
.....................</html>
.....................`.align();
```

`String::indent(int n)` can be used to control the amount of white space
added or removed from each line; a positive n adds n spaces (U+0020) and
negative n removes n white spaces.

```
Ex 5.
    // The manual and stripped are equivalant strings
    String manual =
`    <html>
        <body>
            <p>Hello World</p>
        </body>
     </html>
`;
    String aligned = `
.....................<html>
.....................    <body>
.....................        <p>Hello World</p>
.....................    </body>
.....................</html>
.....................`.align().indent(4);
```

In the cases where align() is not what the developer wants, we expect the
preponderance of cases to be align().ident(n). Therefore, an additional
variation of `align` will be provided: `String::align(int n)` where `n`
is the indentation applied to the string after _alignment_.

```
Ex 6.
    // The manual and stripped are equivalant strings
    String manual =
`    <html>
        <body>
            <p>Hello World</p>
        </body>
     </html>
`;
    String aligned = `
.....................<html>
.....................    <body>
.....................        <p>Hello World</p>
.....................    </body>
.....................</html>
.....................`.align(4);
```

Specification
-------------

```
    /**
     * Adjusts the indentation of each line of this string based on the value of
     * {@code n}, and normalizes line termination characters.
     * <p>
     * This string is conceptually separated into lines using
     * {@link String#lines()}. Each line is then adjusted as described below
     * and then suffixed with a line feed {@code "\n"} (U+000A). The resulting
     * lines are then concatenated and returned.
     * <p>
     * If {@code n > 0} then {@code n} spaces (U+0020) are inserted at the
     * beginning of each line. {@link String#isBlank() Blank lines} are
     * unaffected.
     * <p>
     * If {@code n < 0} then up to {@code n}
     * {@link Character#isWhitespace(int) white space characters} are removed
     * from the beginning of each line. If a given line does not contain
     * sufficient white space then all leading
     * {@link Character#isWhitespace(int) white space characters} are removed.
     * Each white space character is treated as a single character. In
     * particular, the tab character {@code "\t"} (U+0009) is considered a
     * single character; it is not expanded.
     * <p>
     * If {@code n == 0} then the line remains unchanged. However, line
     * terminators are still normalized.
     * <p>
     *
     * @param n  number of leading
     *           {@link Character#isWhitespace(int) white space characters}
     *           to add or remove
     *
     * @return string with indentation adjusted and line endings normalized
     *
     * @see String#lines()
     * @see String#isBlank()
     * @see Character#isWhitespace(int)
     *
     * @since 12
     */
    public String indent(int n) {

    /**
     * Removes vertical and horizontal white space from around the
     * essential body of this string's lines, while preserving relative
     * indentation.
     * <p>
     * This string is first conceptually separated into lines as if by
     * {@link String#lines()}.
     * <p>
     * Then, the <i>minimum indentation</i> (min) is determined as follows. For
     * each non-blank line (as defined by {@link String#isBlank()}), the
     * leading {@link Character#isWhitespace(int) white space} characters are
     * counted. The <i>min</i> value is the smallest of these counts.
     * <p>
     * For each non-blank line, <i>min</i> leading white space characters are
     * removed. Each white space character is treated as a single character. In
     * particular, the tab character {@code "\t"} (U+0009) is considered a
     * single character; it is not expanded.
     * <p>
     * Leading and trailing blank lines, if any, are removed. Trailing spaces are
     * preserved.
     * <p>
     * Each line is suffixed with a line feed character {@code "\n"} (U+000A).
     * <p>
     * Finally, the lines are concatenated into a single string and returned.
     *
     * @apiNote
     * This method's primary purpose is to shift a block of lines as far as
     * possible to the left, while preserving relative indentation. Lines
     * that were indented the least will thus have no leading white space.
     *
     * Example:
     * <blockquote><pre>
     * `
     *      This is the first line
     *          This is the second line
     * `.align();
     *
     * returns
     * This is the first line
     *     This is the second line
     * </pre></blockquote>
     *
     * @return string with margins removed and line terminators normalized
     *
     * @see String#lines()
     * @see String#isBlank()
     * @see String#indent(int)
     * @see Character#isWhitespace(int)
     *
     * @since 12
     */
    public String align() {

    /**
    * Removes vertical and horizontal white space from around the
     * essential body of this string's lines, while preserving relative
     * indentation with optional indentation adjustment.
     * <p>
     * Invoking this method is equivalent to:
     * <blockquote>
     *  {@code this.align().indent(n)}
     * </blockquote>
     *
     * @apiNote
     * Examples:
     * <blockquote><pre>
     * `
     *      This is the first line
     *          This is the second line
     * `.align(0);
     *
     * returns
     * This is the first line
     *     This is the second line
     *
     *
     * `
     *    This is the first line
     *       This is the second line
     * `.align(4);
     * returns
     *     This is the first line
     *         This is the second line
     * </pre></blockquote>
     *
     * @param n  number of leading white space characters
     *           to add or remove
     *
     * @return string with margins removed, indentation adjusted and
     *         line terminators normalized
     *
     * @see String#align()
     *
     * @since 12
     */
    public String align(int n) {
```

Comments
Moving to Approved.
12-09-2018

I find the revised version much clearer; moving to Provisional.
24-08-2018

CSR brought back to DRAFT stage and specification extensively replaced. Discussion prior to this comment may only apply to previous incarnation of specification.
24-08-2018

Moving to Approved. Please add a statement up front clarifying that conceptually the string is decomposed into lines that exclude any line terminator characters before the indent work occurs and then the line terminators are normalized.
24-07-2018

Just realized that the changes I made to the comments did not register (re apiNotes). There seems to be a few bugs in the handling of CSRs including the fact that the Add comment button frequently shows up disabled.
23-07-2018

I recommend these methods explicitly state "The string is first decomposed into lines where a lines are separately by [explain line ending handling] and the last line may be terminated by the end of the string [assuming that is the policy]." The line-ifying first may be implicit, but would be clearer as explicit. Per http://openjdk.java.net/jeps/8068562, the apiNote tag is used for " This category consists of commentary, rationale, or examples pertaining to the API." Normative, testable statements about the API should thus not be in apiNote text. Please review the proposed changes to only use apiNote for informative text rather than normative specifications.
23-07-2018

Amending line terminator note. Not sure that a discussion about line terminators and white space is warranted. The method description states that String::indent() processes lines, not the string as a whole. The implication is the line terminator has already been processed separately. Including line terminators in a white space discussion may be more confusing.
20-07-2018

The line terminator handling of the indent methods * @apiNote The line terminators carriage return {@code "\r"} * ({@code U+000D}) and a carriage return followed * immediately by a line feed {@code "\r\n"} * ({@code U+000D U+000A}) will be * replaced with line feed {@code "\n"} ({@code U+000A}). needs to be upgraded to full normative text and not relegated as a note. In addition, the (presumed) non-interaction of line terminators and whitespace stripping for negative arguments should be discussed since the line terminators characters are isWhitespace() == true. Concretely, say there was a three-character input line SPACE SPACE LINEFEED One *could* argue that a call of indent(-3) on this line should delete the line since the LINEFEED is simply the the third whitespace character. However, this is presumably not the desired behavior of the method. Please clarify.
19-07-2018

Moving to Provisional, but I'd like to see the wording and spec tightened up before this is finalized. I agree with Sherman's points on white space would would like to see more explicit discussion here in more than one method. Just to check my understanding, align(int n) is intrinsically a two-pass method since it sounds like the minimum number of leading white space characters needs to be computed. I suggest phrasing the "If {@code n < 0}" clause as something like "removes up to n white space characters from the beginning of a line ..."
17-07-2018

(1) It'd be better to specific what the "leading/trailing space(s)" are. Does the "space" refer to the "space character ' '? then it'd bee better to be specific that it's '\u0020'. (I doubt we want to deal with more general unicode "space_separator" characters, in which Character.getType(i) == Character.SPACE_SEPARATOR) (2) same for the "whitespace" and "white space" mentioned in spec. need to be specific. are we only talking about ' ' and '\t' here? Unicode "whitespace" includes line separator though. (3) "line separators are replaced with \n". need to specify \r, \n, \r\n as line separator? otherwise it might be tricky to ask \r\n for two \n? (4) "The first line is does not affect..." an extra "is"?
25-04-2018

Reviewed -Sundar
29-03-2018