JDK-8048330 : JEP 269: Convenience Factory Methods for Collections
  • Type: JEP
  • Component: core-libs
  • Sub-Component: java.util:collections
  • Priority: P3
  • Status: Closed
  • Resolution: Delivered
  • Fix Versions: 9
  • Submitted: 2014-06-26
  • Updated: 2017-06-26
  • Resolved: 2017-06-26
Related Reports
Duplicate :  
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8077097 :  
JDK-8133975 :  
JDK-8133976 :  
JDK-8133977 :  
JDK-8133979 :  
JDK-8133980 :  
JDK-8134373 :  
JDK-8134525 :  
JDK-8139231 :  
JDK-8139232 :  
JDK-8139233 :  
JDK-8144952 :  
JDK-8156079 :  
JDK-8159404 :  
JDK-8166365 :  
JDK-8169222 :  
Description
Summary
-------

Define library APIs to make it convenient to create instances of collections and maps with small numbers of elements, so as to ease the pain of not having collection literals in the Java programming language.


Goals
-----

Provide static factory methods on the collection interfaces that will create compact, unmodifiable collection instances. The API is deliberately kept minimal. 


Non-Goals
---------

It is not a goal to provide a fully-general "collection builder" facility that, for example, lets the user control the collection implementation or various characteristics such as mutability, expected size, loading factor, concurrency level, and so forth.

It is not a goal to support high-performance, scalable collections with arbitrary numbers of elements. The focus is on small collections.

It is not a goal to provide unmodifiable collection types. That is, this proposal does not expose the characteristic of unmodifiability in the type system, even though the proposed implementations are actually unmodifiable.

It is not a goal to provide "immutable persistent" or "functional" collections.


Motivation
----------

Java is often criticized for its verbosity. Creating a small, unmodifiable collection (say, a set) involves constructing it, storing it in a local variable, and invoking `add()` on it several times, and then wrapping it. For example,

    Set<String> set = new HashSet<>();
    set.add("a");
    set.add("b");
    set.add("c");
    set = Collections.unmodifiableSet(set);

This is quite verbose, and because it cannot be expressed in a single expression, static collections must be populated in static initializer blocks rather than via a more convenient field initializer. Alternatively, one can populate a collection using a copy constructor from another collection:

    Set<String> set = Collections.unmodifiableSet(new HashSet<>(Arrays.asList("a", "b", "c")));

This is still somewhat verbose and also less obvious, since one has to create a `List` before creating the `Set`. Another alternative is to use the so-called "double brace" technique:

    Set<String> set = Collections.unmodifiableSet(new HashSet<String>() {{
        add("a"); add("b"); add("c");
    }});

This uses the instance-initializer construct in an anonymous inner class, which is a bit prettier. However, it is quite obscure, and it costs an extra class at each usage. It also holds hidden references to the enclosing instance and to any captured objects. This may cause memory leaks or problems with serialization. For these reasons, this technique is best avoided.

The Java 8 Stream API can be used to construct small collections, by combining stream factory methods and collectors. For example,

    Set<String> set = Collections.unmodifiableSet(Stream.of("a", "b", "c").collect(toSet()));

(The streams collectors make no guarantees about the mutability of collections they return. In Java 8, the returned collections are ordinary, mutable collections such as `ArrayList`, `HashSet`, and `HashMap`, but this might change in future JDK releases.)

This is somewhat roundabout, and while not obscure, it's also not very obvious. It also involves a certain amount of unnecessary object creation and computation. As is typical, `Map` is the outlier. Streams cannot be used this way to construct a `Map`, unless the value can be computed from the key, or if the stream elements contain both the key and the value.

In the past, there have been a few proposals floated to change the Java programming language to support collection literals.  However, as is often the case with language features, no feature is as simple or as clean as one might first imagine, and so collection literals will not be appearing in the next version of Java.  

Much of the benefit of collection literals can be gained by providing library APIs for creating small collection instances, at significantly reduced cost and risk compared to changing the language. For example, the code to create a small Set instance might look like this:

    Set<String> set = Set.of("a", "b", "c");

There are existing factories in the `Collections` class to support the creation of empty `List`s, `Set`s, and `Map`s. There are also factories to produce singleton `List`s, `Set`s, and `Map`s that have exactly one element or key-value pair. `EnumSet` contains several overloaded `of(...)` methods that take fixed or variable numbers of arguments, for conveniently creating an `EnumSet` with the specified elements. However, there is no good general-purpose way of creating `List`s, `Set`s, and `Map`s containing objects of arbitrary types.

There are combinator methods in the `Collections` class for the creation of unmodifiable `List`s, `Set`s, and `Map`s. These don't create inherently unmodifiable collections. Instead, they take another collection and wrap it in a class that rejects modification requests, creating an unmodifiable *view* of the original collection. Possession of a reference to the underlying collection still allows modification. Each wrapper is an additional object, requiring another level of indirection and consuming more memory than the original collection. Finally, the wrapped collection still bears the expense of supporting mutation even if it is never intended to be modified.


Description
-----------

Provide static factory methods on the `List`, `Set`, and `Map` interfaces for creating unmodifiable instances of those collections. (Note that unlike static methods on classes, static methods on interfaces are not inherited, so it will not be possible to invoke them via an implementing class, nor via an instance of the interface type.)

For `List` and `Set`, these factory methods would work as follows:

    List.of(a, b, c);
    Set.of(d, e, f, g);

These will include varargs overloads, so that there is no fixed limit on the collection size. However, the collection instances so created may be tuned for smaller sizes. Special-case APIs (fixed-argument overloads) for up to ten of elements will be provided. While this introduces some clutter in the API, it avoids array allocation, initialization, and garbage collection overhead that is incurred by varargs calls. Significantly, the source code of the call site is the same regardless of whether a fixed-arg or varargs overload is called.

For `Map`s, a set of fixed-argument methods will be provided:

    Map.of()
    Map.of(k1, v1)
    Map.of(k1, v1, k2, v2)
    Map.of(k1, v1, k2, v2, k3, v3)
    ...

We expect that supporting small maps of up to ten key-value pairs will be sufficient to cover the majority of use cases. For larger numbers of entries, an API will be provided that will create a `Map` instance given an arbitrary number of key-value pairs:

    Map.ofEntries(Map.Entry<K,V>...)

While this approach is analogous to the equivalent varargs APIs for `List` and `Set`, it unfortunately requires that each key-value pair be boxed. A method for boxing keys and values, suitable for static import, will make this more convenient:

    Map.Entry<K,V> entry(K k, V v)

Using these methods, it will be possible to create a map with an arbitrary number of entries:

    Map.ofEntries(
        entry(k1, v1),
        entry(k2, v2),
        entry(k3, v3),
        // ...
        entry(kn, vn));

(It might be possible to mitigate the expense of boxing by using value types in a future version of the JDK. The `entry()` convenience method will actually return a newly introduced concrete type that implements `Map.Entry`, in order to facilitate potential future migration to a value type.)

Providing APIs for creating small, unmodifiable collections satisfies a large set of use cases, and it helps keep the specification and implementations simple. Unmodifiable collections avoid the need to make defensive copies, and they are more amenable to parallel processing.

The runtime space occupied by small collections is also a strong consideration. A straightforward creation of an unmodifiable `HashSet` with two elements, using the wrapper APIs, would be comprised of six objects: the wrapper, the `HashSet`, which contains a `HashMap`, its table of buckets (an array), and one `Node` instance per element. This imposes a tremendous amount of overhead compared to the amount of data stored, and access to the data unavoidably requires multiple method calls and pointer dereferences. Implementations designed for small, fixed-sized collections can avoid most of this overhead, using a compact field-based or array-based layout. Not needing to support mutation (and knowing the collection size at creation time) also contributes to space savings.

The concrete classes returned by these factories will not be exposed as public APIs. No guarantees will be made about the runtime type or identity of the returned collection. This will allow the implementations to change over time without breaking compatibility. The only thing the caller should rely on is that the reference returned is an implementation of its interface type.

The resulting objects will be serializable. A serialization proxy object will be used as the common serialized form for the implementation classes. This will prevent information about the concrete implementations from leaking into the serialized form, thus preserving flexibility for future maintenance, and allowing the concrete implementations to change from release to release without affecting serialization compatibility.

Null elements, keys, and values will be disallowed. (No recently introduced collections have supported nulls.) In addition, prohibiting nulls offers opportunities for a more compact internal representation, faster access, and fewer special cases.

The `List` implementations are expected to provide fast element access by index, so they will implement the `RandomAccess` marker interface.

Elements stored in these collections must support typical collections contracts, including proper support for `hashCode()` and `equals()`. If an element of a `Set` or a key of a `Map` is mutated in a way that affects its `hashCode()` or `equals()` methods, the behavior of the collection could become unspecified. 

Once constructed and safely published, these collection instances will be safe for concurrent access by multiple threads.

The JDK will be searched for potential sites where these new APIs can be used. These sites will be updated to use the new APIs as time and schedule permit.


Alternatives
------------

Language changes have been considered several times, and rejected:

  1. Project Coin Proposal, [29 March 2009](http://mail.openjdk.java.net/pipermail/coin-dev/2009-March/001112.html)
  2. Project Coin Proposal, [30 March 2009](http://mail.openjdk.java.net/pipermail/coin-dev/2009-March/001193.html)
  3. JEP 186 discussion on lambda-dev, [January-March 2014](http://mail.openjdk.java.net/pipermail/lambda-dev/2014-January/011671.html)

The language proposals were set aside in preference to a library-based proposal as summarized in [this message](http://mail.openjdk.java.net/pipermail/lambda-dev/2014-March/011938.html).

The Google Guava libraries have a rich set of utilities for [creating immutable collections](https://code.google.com/p/guava-libraries/wiki/ImmutableCollectionsExplained), including a builder pattern, and for creating a [wide variety of mutable collections](https://code.google.com/p/guava-libraries/wiki/CollectionUtilitiesExplained). The Guava libraries are very useful and general but are perhaps overkill for inclusion into the Java SE Platform. This proposal is similar to a proposal from Stephen Colebourne [lambda-dev, 19 Feb 2014](http://mail.openjdk.java.net/pipermail/lambda-dev/2014-February/011897.html) and includes some ideas from the Guava immutable collection factory methods.

The `Map.fromEntries()` approach for initializing a `Map` with an arbitrary number of entries is not ideal, but it seems to be the least bad of the alternatives. Its advantages are that it is type-safe, it has keys and values adjacent in the syntax, the number of entries is known at compile time, and it is suitable for use as a field initializer. However, it involves boxing, and it is rather verbose. Several alternatives were considered, and they all introduced tradeoffs that seemed to make them worse than the current proposal.

Static factory methods on concrete collection classes (e.g., ArrayList, HashSet) have been removed from this proposal. They seem like they are useful, but in practice they tend distract developers from using the factory methods for the immutable collections. There is a small set of use cases for initializing a mutable collection instance with a predefined set of values. It's usually preferable to have those predefined values be in an immutable collection, and then to initialize the mutable collection via a copy constructor.

There is another wrinkle, which is that static methods on classes are inherited by subclasses. Suppose a static factory method HashMap.of() were to be added. Since LinkedHashMap is a subclass of HashMap, it would be possible for application code to call LinkedHashMap.of(). This would end up calling HashMap.of(), not at all what one would expect! One way to mitigate this is to ensure that all concrete collection implementations have the same set of factory methods, so that inheritance doesn't occur. Inheritance would still be an issue for user-defined subclasses of the concrete collections.


Testing
-------

There will be the usual set of unit tests in the JDK regression test suite, and JCK tests for the public APIs. The serialized forms may also be covered by the JCK.

A set of size and performance tests will be developed. In contrast to typical goals of comparing to baseline measurements, these tests will compare the new collection implementations to existing ones. The expectation is that the new collections will consume less heap space, both in terms of fixed overhead and on a per-element basis. In some cases the new collections may be slower, however, because of the different internal representation as compared to the existing collections. Any such slowdown should be reasonable. Although there are no specific performance goals, being 10x slower is unacceptable. In addition, the new collections should maintain consistent performance as the number of elements is increased. Finally, the measurements will establish baseline performance figures against which future changes should be compared.

Comments
Updated draft spec & impl posted for review and approval for initial integration: Review thread: http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-November/036827.html API spec (basically List, Map, and Set): http://cr.openjdk.java.net/~smarks/reviews/jep269/api.20151123/ Specdiff: http://cr.openjdk.java.net/~smarks/reviews/jep269/specdiff.20151123/overview-summary.html Webrev: http://cr.openjdk.java.net/~smarks/reviews/jep269/webrev.20151123/
24-11-2015

Updated draft spec & impl posted for review. Review thread: http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-November/036331.html Spec: http://cr.openjdk.java.net/~smarks/reviews/jep269/api.20151105/ Specdiff: http://cr.openjdk.java.net/~smarks/reviews/jep269/specdiff.20151105/java/util/package-summary.html Webrev: http://cr.openjdk.java.net/~smarks/reviews/jep269/webrev.20151105/
06-11-2015

Open issues (a mixture of API and implementation issues): ---------- Performance requirements or advice to applications? �� The Set and Map implementations should provide constant time average performance for basic operations such as contains() and get(). �� ---------- Iteration order and randomization. �� The iteration order of the hash-based collections (Set and Map) will be unspecified, similar to existing hash-based collections. Historically, while the iteration order of such collections has always been unspecified, application code, including test code, has often ended up with inadvertent dependencies on a particular iteration order. This has caused compatibility issues when the iteration order has changed -- as has historically occurred in most major JDK releases -- even though the iteration order has never been specified. In order to mitigate this issue, strong consideration will be given to varying the iteration order of sets and maps so that client code will be less likely to develop inadvertent order dependencies. This will allow the internal representation of the new collections to be modified without raising the issue of preserving iteration order. �� ---------- Policy for handling duplicates in sets and maps. Fail-fast, or silent-ignore? Probably not last-wins, in order to preserve implementation flexibility. �� An exception will be thrown if duplicate elements are provided when creating a Set, or if duplicate keys are provided when creating a Map. The presence of duplicates is likely to be a programming error. Throwing an exception at creation time is the earliest point at which library code can detect such errors. �� ---------- Map.of(Map.Entry...) instead of Map.fromEntries(). It would be nice to use a better name than fromEntries(). But overload resolution issues would have to be dealt with. Changed to Map.ofEntries(). ---------- Add fixed-args overloads of Map.ofEntries()? ---------- Does KeyValueHolder need to be public? Re-examine transition to value types.
09-10-2015

Current draft spec posted: http://cr.openjdk.java.net/~smarks/reviews/jep269/api.20151008.mod/ Specdiiff: http://cr.openjdk.java.net/~smarks/reviews/jep269/api.20151008.specdiff/java/util/package-summary.html Draft implementation available in the JEP-269-api-branch of the sandbox forest: http://hg.openjdk.java.net/jdk9/sandbox/ The current state of the implementation is that it fully implements the API, and it has a complete set of regression tests. It's a bare-bones implementation though, based on existing Collections.unmodifiable wrappers, so there are no space efficiency gains present. Also note, the current collections are serializable, but their serial forms are NOT final and will be changed.
09-10-2015

The current state of the proposal has been pared back to just the API changes. The various implementation prototypes have been set aside for the moment in favor of pushing the API forward. One area of uncertainty is the number of fixed-arg overloads for the various collection types, particularly List, Set, ArrayList, and HashSet, which all have up to two fixed args. Probably more should be added. Map and HashMap are currently at 8 pairs, which seems sufficient for the moment. The current implementation is serializable, but the resulting serialized forms are not final. Do not depend on any serialized forms! Note that the webrev contains some entries for files whose changes have been reverted; they show zero lines changed. Also, the specdiff shows some changes in unrelated files. In particular, EnumSet shows changes that are "specified by" static methods in the Set interface. I believe this is a javadoc bug. Current webrev: http://cr.openjdk.java.net/~smarks/reviews/8048330/webrev.20150910/ Current specdiff: http://cr.openjdk.java.net/~smarks/reviews/8048330/specdiff.20150910/overview-summary.html Current code is on the JDK-8048330-api-branch of the JDK 9 sandbox forest: http://hg.openjdk.java.net/jdk9/sandbox/
11-09-2015

Draft specification and updated prototype implementation available here: http://cr.openjdk.java.net/~smarks/reviews/8048330/webrev.20150806/ This is also available in the JDK 9 sandbox, on the "JDK-8048330-branch" branch. http://hg.openjdk.java.net/jdk9/sandbox/
10-08-2015