JDK-8042355 : stream with sorted() causes downstream ops not to be lazy
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.stream
  • Affected Version: 8
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2014-05-02
  • Updated: 2014-07-29
  • Resolved: 2014-05-06
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8 JDK 9
8u20Fixed 9 b13Fixed
Description
Consider the following code fragment. The result of the entire pipeline is "ab" in all cases, as expected.

Stream.of("a", "ab", "abc", "abcd")
    // .sorted() // uncomment and what follows becomes eager
    .filter(s -> s.contains("b"))
    .peek(s -> System.out.println("PEEK: " + s))
    .findFirst()
    .orElse("X");

With sorted() commented out, findFirst() is fully lazy and so the output from peek is:

    PEEK: ab

However, with sorted() uncommented, the output changes to:

    PEEK: ab
    PEEK: abc
    PEEK: abcd

All the elements appear to be sent downstream from sorted() even though findFirst() is only interested in the first one. This could be a performance problem if the operations after sorted(), such as filtering, are expensive.

Workaround: collect the sorted results into a collection and run the rest of the pipeline from that collection.

See also: http://stackoverflow.com/q/23419223/1441122
Comments
The sorted() op is a barrier that collects all elements, sorts 'em, then pushes 'em *all* downstream, regardless of whether the pipeline is short-circuiting (either terminal or intermediate op, such as limit). Perhaps surprisingly this is within the bounds of acceptable behaviour. More elements can pass through the pipeline than are required for producing the result, especially for parallel evaluation. However, it might be possible to avoid this for such sequential streams. A simple solution is for the sorted() op to note if Sink.cancellationRequested() is called while collecting elements, and if so, call cancellationRequested() before pushing each element.
04-05-2014