JDK-8180352 : Add Stream.toList() method
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.util.stream
  • Affected Version: 9
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2017-05-12
  • Updated: 2020-12-07
  • Resolved: 2020-11-30
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 16
16 b27Fixed
Related Reports
CSR :  
Duplicate :  
Relates :  
Sub Tasks
JDK-8257870 :  
`stream.collect(Collectors.toList())` and `stream.collect(Collectors.toSet())` 

are two the most popular collectors used for Stream. It would be nice to simplify/encapsulate the call by something like 

`stream.toList()` and `stream.toSet()`
(or `stream.collectToList()` and `stream.collectToSet()`)

It makes the coding just easier, saves some time and place.

If you research the most of Internet posts about java streams collecting, 90% will use these collectors, thus it would be nice to have some default implementation, without specifying a collector explicitly.

Also,  `Stream#toMap(Function<? super T, ? extends K> keyMapper, Function<? super T, ? extends U> valueMapper)` (and related) might be helpful.

Stream interface extended to have default toList(), toSet() and optionally toMap(Function<? super T, ? extends K> keyMapper, Function<? super T, ? extends U> valueMapper) methods
collector should be explicitly specified every time

---------- BEGIN SOURCE ----------
//IntStream.range(1, 1000).boxed().collect(Collectors.toList())   
IntStream.range(1, 1000).boxed().toList()
IntStream.range(1, 1000).boxed().collectToList()

IntStream.range(1, 1000).boxed().toSet()
IntStream.range(1, 1000).boxed().collectToSet()

---------- END SOURCE ----------

Changeset: 41dbc139 Author: Stuart Marks <smarks@openjdk.org> Date: 2020-11-30 19:37:56 +0000 URL: https://git.openjdk.java.net/jdk/commit/41dbc139

Please talk with Brian about this change because it nulls a property of the Stream API we (the lambda-util group) have take time to keep. The whole Stream API doesn't depends on the Collection API, adding a toList() so the Stream API can be easily integrated with other collection API if in the future we want by example add a persistent collection API with no link with java.util.List.

Current thinking is to provide just toList(), and that its implementation of the returned list is an unmodifiable list of the same ilk as List.of(). The *specification* could be the same as Collectors.toList(); that specification wording is sufficiently flexible as to allow different List implemenations to be returned. In other words, there's no requirement that they return the same List implementation. This might seem "inconsistent" but it's fairly clear that returning a straight ArrayList from Collectors.toList() is problematic; there's no reason to perpetuate this. Tagir Valeev pointed out in an earlier comment that the List.of() family is null-hostile. It may be necessary to adjust the spec wording of the putative Stream.toList() to include a stipulation about null handling in the "no guarantees" clause. It may also be worthwhile to add null handling to the "no guarantees" clause of Collectors.toList() as well. An alternative specification would be to specify stricter characteristics on the return list, much as is done for List.of() and friends today, including the value-based statement. (In fact this would probably just link to the "Unmodifiable Lists" section of the List class doc.) This would seem closer to John Rose's position as stated above. At this point the implementation seems clear. The question is how tightly it's specified.

toArray has the properties of example 1 and 2, in addition it will fuse with the immediate operation to the left if that produces an intermediate array (e.g. sorted). The default implementation of Stream.toList can call toArray then create an unmodifiable list wrapping that array (preferably without copying e.g. using an internal method). Ideally we would like to propagate characteristics such size estimates/exactness to the collectors although that does complicate the API.

I would like a terminal method on Stream which gives me an value-based representation of the elements of that stream. I think that means something with this behavior, though optimized better: default List<T> $BIKESHED$() { return List.of(this.toArray()); } Where $BIKESHED$ is toList, or values, or valueList, or something else that invites frequent use. More generally, in favor of List.of: I think we are moving in two distinct but correlated directions with respect to the reporting of multiple values (of some type T) through APIs. Both involve an escape from the legacy of Java array types T[]. First, we are using List<T> because it plays better with generics (both today's and probably tomorrow's) an allows abstraction over representation. Second, we are reducing the mutability of these multiple value containers, to preserve safe sharing of state while reducing the need for defensive copying. Both of these directions affect the design of stream terminal operations, away from Stream.toArray and toward List.of. In a world where lists are routinely unmodifiable, and can be easily made so (view an optimized copyOf) it will be common to compute some multiple values via a stream, and store a value-based list of those values into an object for safe keeping. In such a world, the simple, normal API points on Stream and List should make this easy to do without special effort, because requiring special effort to access safety and performance will cause simple code to be buggier and slower. For performance, at least one of the List-producing collectors, and perhaps the default one, should be designed in such a way that it makes it possible (or not impossible) for the resulting list to share storage with whatever internal temporaries the stream used to gather the resulting value. That means there should be only limited post-processing, such as error detection or memory fencing, required by the specified semantics of the produced list. A list of unspecified class (such as a List.of list) makes fewer commitments and therefore will be easier to co-optimize with Stream so that there are fewer copies. Example 1: The stream is fully sized, and can allocate an exactly-sized array before element computation, and parallel tasks can store down their results by dead reckoning; if the array is unaliased it can be plugged directly into a List.of unmodifiable list, after memory fences and null checks. Example 2: The stream is not fully sized, or the system is NUMA and uses distinct array segments for tasks. In that case, the terminal operation must (in addition to fences and checks) either recopy everything into a combined array, or else arrange an index of the results for random access. A copy-free algorithm will do the latter. This is incompatible with any of our existing list APIs except List.of. Adding the requirement of subsequent mutability would complicate the data structure for gathering the stream results, because it would require that the index support support mutability beyond simply element-wise updates (a la Arrays.asList, which is not mutable enough). A copy-free stream collector should therefore be immutable, even though there is a good mutable candidate today (ArrayList). I suggest that Stream.toList should default to this potentially copy-free, immutable collector. As a broader point, unmodifiability tends to make for safer code, since it prevents an important class of bugs and intentional attacks from causing unexpected program behavior. APIs should make it easy to work with collections which do not accept side effects.

Current thinking is that toSet() and toMap() are unnecessary, as they both entail post-processing of the stream elements: deduplication for toSet, and mapping to key and value for toMap, along with merging. These make sense on Collectors, less so in a Stream terminal operation. Adding toList() is sensible as it simply gathers all the elements of the stream, similar to toArray().

If considering List.of(Stream.toArray()) option, it should be noted that it does not tolerate nulls, while in general stream does. Also for toSet: Set.of(Stream.toArray()) will not work at all as it does not tolerate repeating values in the supplied array. Arrays.asList option seems pretty useless and even dangerous to me. It's rarely necessary to create a List where you can replace elements, but cannot add. Actually quite a number of developers out there believe that Arrays.asList returns an immutable list. So this belief could propagate to the new method, which may result in unnoticed modifications of the list believed to be immutable. So if making a mutable list, it should be a fully mutable. My vote is to make a pre-sized ArrayList. You can even optimize this using toArray() and avoiding extra copy: interface Stream { ... @SuppressWarnings("unchecked") default List<T> toList() { return new ArrayList<>((Collection<T>) new ArrayCollection(toArray())); } } static class ArrayCollection extends AbstractCollection<Object> { private final Object[] arr; ArrayCollection(Object[] arr) { this.arr = arr; } @Override public Iterator<Object> iterator() { throw new UnsupportedOperationException(); } @Override public int size() { throw new UnsupportedOperationException(); } @Override public Object[] toArray() { // intentional contract violation here: // this way new ArrayList(new ArrayCollection(arr)) will not copy // array at all return arr; } } This exploits the fact that `new ArrayList(collection)` calls `collection.toArray()`. As a result we will have an ArrayList trimmed to the size, which is fine, as most of users are not going to modify it. However it's still mutable (and accepts nulls!). By the way this feature will in general make JDK-8072840 unnecessary.

The typical workaround that has been suggested is to use collect(Collectors.toList()). There are a number of subtle issues with adding toList() and related methods directly to Stream. Collectors.toList() is specified to return a List but it also says there are ��no guarantees on the type, mutability, serializability, or thread-safety�� of the returned List. Its implementation returns an ArrayList. There is probably code out there that depends on this, so in practice Collectors.toList() probably can't be changed, and perhaps the specification should be modified simply to require this. There is another school of thought that says that Collectors.toList() can change, so calling code really should be using Collectors.toCollection(ArrayList::new) if it wants an ArrayList, and breaking calling code is OK because the specification has the "no guarantees" clause. With that in mind, what should Stream.toList() be defined to return? There are the two possibilities mentioned above, plus a couple others: 1. A List with unspecified characteristics (as Collectors.toList() is defined today) 2. An ArrayList 3. A mutable List backed by an array, as if Arrays.asList(Stream.toArray()) 4. An immutable list as if List.of(Stream.toArray()) Similar questions apply to toSet() and toMap(). The toMap() case is of course more complicated, as it requires two mapping functions and a merge policy.