JDK-8200425 : String::lines
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.lang
  • Priority: P3
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 11
  • Submitted: 2018-03-29
  • Updated: 2018-05-18
  • Resolved: 2018-05-02
Related Reports
CSR :  
Description
Summary
-------

Add an instance method to `java.lang.String` that returns a stream of the lines
of the contents of a multi-line string.

Problem
-------

With the introduction of JEP 326 Raw String Literals, it is expected
that developers will make routine use of multi-line strings in their
applications. To facilitate processing of multi-line strings it will be
common for developers to break those strings down into collections of
lines.

The existing techniques for production of collections of lines vary
depending on the application goals. If the developer wants to use
streams and functional coding style, then the techniques available can
be cumbersome.

```
Ex 1.
    Stream<String> lines = List.of(string.split(`\n|\r\n|\r`)).stream();
    
Ex 2. (more recently)
    Stream<String> lines = List.of(string.split(`\R`)).stream();
    
Ex 3.
    Stream<String> lines = Pattern.compile(`\R`).splitAsStream���(string);
    
Ex 4.
    Stream<String> lines = new BufferedReader���(new StringReader���(string)).lines();
```

Beside being cumbersome, examples 1 & 2 require additional memory for
an intermediate array and all line substrings up front.

Solution
--------

Introduce a String instance method that uses a specialized Spliterator to
lazily provide lines from the source string.

```
    Stream<String> lines = string.lines();
```

This method simplifies the developer code, significantly reduces 
memory requirements and is an order of magnitude faster that any
previously subscribed code pattern.

```
Ex.
    String trimmedLines = string.lines()
                                .map(String::trim)
                                .collect(joining("\n"));
```

Specification
-------------

```
    /**
     * Returns a stream of substrings extracted from this string
     * partitioned by line terminators.
     * <p>
     * Line terminators recognized are line feed
     * {@code "\n"} ({@code U+000A}),
     * carriage return
     * {@code "\r"} ({@code U+000D})
     * and a carriage return followed immediately by a line feed
     * {@code "\r\n"} ({@code U+000D U+000A}).
     * <p>
     * The stream returned by this method contains each line of
     * this string that is terminated by a line terminator except that
     * the last line can either be terminated by a line terminator or the 
     * end of the string.
     * The lines in the stream are in the order in which
     * they occur in this string and do not include the line terminators
     * partitioning the lines.
     *
     * @implNote This method provides better performance than
     *           split("\R") by supplying elements lazily and
     *           by faster search of new line terminators.
     *
     * @return  the stream of strings extracted from this string
     *          partitioned by line terminators
     *
     * @since 11
     */
    public Stream<String> lines() {
```

Comments
Roger: There doesn't seem any precedent for methods to describe the behavior of .parallel() on the stream (ex. BufferedReader::lines()). Seems like an implementation detail. But as a note, currently String::lines() Spliterator forward searches for an end of line from the string midpoint then splits after that end of line.
07-05-2018

Should there be any spec about the behavior of .parallel() on the stream?
06-05-2018

Moving amended request to Approved.
02-05-2018

Duly modified.
02-05-2018

I think some of the phrasing here could be improved. For example, this paragraph * The stream returned by this method contains each line of * this string that is terminated by a line terminator or end * of string. The lines in the stream are in the order in which * they occur in this string and do not include the line * separator. could be clearer if written as * The stream returned by this method contains each line of * this string that is terminated by a line terminator except that * the last line can either be terminated by a line terminator or the * end of the string. * The lines in the stream are in the order in which * they occur in this string and do not include the line terminators * partitioning the lines. (I think it is confusing to say "line separator" here when the rest of the spec talks about line terminators.) When wanting to be extra precise, the String and Character specs cite particular Unicode code points by number, as in '\n' U+000A NEW LINE used in Character and '\u0020' used in String.trim. I recommend following this pattern this this method, something like * Line terminators recognized are line feed * {@code "\n"} ({@code `\u000A'}) * carriage return {@code "\r"} ({@code `\u000D'}) * ...
02-05-2018

Reviewed. -Sundar
27-04-2018

(Since changes are being proposed for adding user-visible methods to String, that is a change to the SE spec and the Scope field should be set accordingly.)
26-04-2018

Comment adjusted.
26-04-2018

nit {@code "\n", "\r\n"} and {@code "\r"}. --> {@code "\n"}, {@code "\r\n"} and {@code "\r"}. or maybe more specifically the wording from BufferedReader.readLIne() "Line terminators recognized are line feed {@code "\n"|, carriage return {@code "\r"} and a carriage return followed immediately by a line feed {@code "\r\n"}.
25-04-2018

For some reason I kept seeing otherwise, so yes, it should not return an empty line at the end. Will change.
25-04-2018

Again for my last comment. String.split(regex) --> String.split(regex, 0) -> always trims (discards) the tailing empty strings. jshell> "abc\r\n".split("\\R").length $14 ==> 1 jshell>> new BufferedReader(new StringReader("abc\r\n")).lines().count() $17 ==> 1 Shouldn't 'abc\r\n'.lines().size() returns "1" as well?
25-04-2018

Since \r\n is a line separator (not line terminator) yes. "abc\r\n".lines().size() == 2
29-03-2018

"evil of detail" :-) just wonder for string "abc\r\n", is it going to be an empty line at the end?
29-03-2018

Changed to add signature, @implNote and @apiNote
29-03-2018

Should the following be tagged as @implNote? "This method provides better performance than split("\R") by supplying elements lazily and by faster search of new line separators." And the following be "@apiNote"? "Note: unlike BufferedReader::lines() treats new line character sequences as line separators and not as line terminators. This is to reflect the behaviour of the commonly used split("\R") code pattern. "
29-03-2018

Reviewed again for spec. change to include signature. -Sundar
29-03-2018

Method signature is missing in the specification part. Reviewed (subject to fixing the above) -Sundar
29-03-2018