JDK-8316583 : Remove the caching fields in AbstractMap
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.util:collections
  • Priority: P3
  • Status: Provisional
  • Resolution: Unresolved
  • Fix Versions: 26
  • Submitted: 2023-09-20
  • Updated: 2025-06-02
Related Reports
CSR :  
Description
Summary
-------
By removing the caching fields in `AbstractMap`, we can make the immutable maps `@ValueBased`, smaller and, at the same time, improve performance. In fact, *all* Map implementations based on `AbstractMap` will be smaller.

Problem
-------
As the current implementation of `AbstractMap` declares fields that cache the Map's keySet and values views. This prevents subclasses from being `@ValueBased`. Caching the views might have been a good idea around the turn of the millennium but these days, the JVM is oftentimes able to optimize away the creation of new objects that do not escape.

Solution
--------

Instead of caching the views for potential reuse, new instances are created on each call to the keySet() and values() methods. This allows removal of cache fields:

 * `transient Set<K>        keySet;`
 * `transient Collection<V> values;`

According to the previous specs, the methods were not *guaranteed* to return the same cached objects due to concurrency issues, so it would theoretically be okay to just remove them. However, a single-threaded application would observe the view objects returned always being identical.


Specification
-------------


    diff --git a/src/java.base/share/classes/java/util/AbstractMap.java b/src/java.base/share/classes/java/util/AbstractMap.java
    index c5ea53e17db8..52fb5af15889 100644
    --- a/src/java.base/share/classes/java/util/AbstractMap.java
    +++ b/src/java.base/share/classes/java/util/AbstractMap.java
    @@ -301,124 +301,48 @@ public void clear() {
              entrySet().clear();
         }
 
    -
    -    // Views
    -
    -    /**
    -     * Each of these fields are initialized to contain an instance of the
    -     * appropriate view the first time this view is requested.  The views are
    -     * stateless, so there's no reason to create more than one of each.
    -     *
    -     * <p>Since there is no synchronization performed while accessing these fields,
    -     * it is expected that java.util.Map view classes using these fields have
    -     * no non-final fields (or any fields at all except for outer-this). Adhering
    -     * to this rule would make the races on these fields benign.
    -     *
    -     * <p>It is also imperative that implementations read the field only once,
    -     * as in:
    -     *
    -     * <pre> {@code
    -     * public Set<K> keySet() {
    -     *   Set<K> ks = keySet;  // single racy read
    -     *   if (ks == null) {
    -     *     ks = new KeySet();
    -     *     keySet = ks;
    -     *   }
    -     *   return ks;
    -     * }
    -     *}</pre>
    -     */
    -    transient Set<K>        keySet;
    -    transient Collection<V> values;
    -
         /**
          * {@inheritDoc}
          *
          * @implSpec
     -     * This implementation returns a set that subclasses {@link AbstractSet}.
     +     * This implementation returns a fresh set that subclasses {@link AbstractSet}.
          * The subclass's iterator method returns a "wrapper object" over this
          * map's {@code entrySet()} iterator.  The {@code size} method
          * delegates to this map's {@code size} method and the
          * {@code contains} method delegates to this map's
          * {@code containsKey} method.
          *
    -     * <p>The set is created the first time this method is called,
    -     * and returned in response to all subsequent calls.  No synchronization
    -     * is performed, so there is a slight chance that multiple calls to this
    -     * method will not all return the same set.
          */
         public Set<K> keySet() {


        /**
          * {@inheritDoc}
          *
          * @implSpec
    -     * This implementation returns a collection that subclasses {@link
    +     * This implementation returns a fresh collection that subclasses {@link
          * AbstractCollection}.  The subclass's iterator method returns a
          * "wrapper object" over this map's {@code entrySet()} iterator.
          * The {@code size} method delegates to this map's {@code size}
          * method and the {@code contains} method delegates to this map's
          * {@code containsValue} method.
          *
    -     * <p>The collection is created the first time this method is called, and
    -     * returned in response to all subsequent calls.  No synchronization is
    -     * performed, so there is a slight chance that multiple calls to this
    -     * method will not all return the same collection.
          */
         public Collection<V> values() {


PR: https://github.com/openjdk/jdk/pull/15614/files

Discussion
-------------

*Compatibility*

`AbstractMap` is designed to be subclassed. The caching fields are package private and as such they are not accessible from subclasses outside the JDK. Furthermore, the fields are `transient` which means the serialized form is not affected by the proposed change.

Functionally, newly created objects are equivalent to the caching ones but callers that rely on `==`, `indentyHashCode()`, synchronization, `WeakReference` will be affected. However, it seems unreasonable for code to rely on such things about an object they don't have direct control over.  The JDK has 517 direct usages of `keySet()` and 445 direct usages of `values()`; none of which is used for identity comparison. Most uses of the map views in the corpus seem to be transient uses that don't rely on the identity of the view objects. See below for examples and discussion.

The wrapper classes in the `Collections` class are unaffected by the proposed change. The synchronized wrappers always synchronize on the wrapper itself and not on the backing collection, so they are also unaffected by the proposed change.

JDK subclasses `EnumMap`, `HashMap`, `IdentityHashMap`, `LinkedHashMap`, `TreeMap`, `ConsurrentHashMap`, `ConcurrentSkipListMap` and `WeakHashMap` also cached map views and in some cases reused the fields from `AbstractMap`. However, the caching behavior was not specified. The view caches have been removed from these classes as well. Clients of these classes can observe similar behavior changes regarding the identity of the map views. For similar reasons, we do not believe this to be a major compatibility issue.

*Performance*

Initial performance benchmarks indicate 0 - 30% better performance and reduced memory pressure as cached views end up tenured, whereas freshly created view usually dies in young gen or allocation is elided entirely by escape analysis. 

Overall, every `Map` instance will be made smaller by the removal of two fields; this typically saves 8 bytes. On top of this, many maps will have to construct actual values for the fields as soon as the `Map` is used and these objects will occupy some additional 16 bytes (`Set`) and 16 bytes (`Collection`) (typical values). The percentage space reduction of immutable maps (which are common) is especially noteworthy.

Removing the caching of these fields introduces a moderate risk of incompatibilities as not only the class `AbstractMap` is affected but also 60 other classes in the JDK that directly inherit from it and transitively more classes.

*Usage*

A corpus analysis of direct use of `java.util.AbstractMap` indicates extensive use:

    wc -l experiment.html 
        534562 experiment.html

As the file contains two lines per found entry, this means there are more than 250,000 usages of `AbstractMap` including prominent libraries like Guava. However, looking at a larger number of sample uses, these fall into three main categories:

 * Transient use
 * Long-term use
 * Use based on identity

Transient use appears to be by far the most common use and here we see patterns like iteration, size, creating an array, and copying into another collection.
Long-term use seems uncommon, and we have not been able to identify a single use based on identity in the samples. This does not rule out that there are, indeed, such use cases; but if they do occur, they would seem to be quite rare.

Here is some sample usage:

```
jo.keySet().toArray(new String[jo.length()])

map.keySet().iterator()

new ArrayList<>(nameValuePairs.keySet())

set.equals(((JSONObject)other).keySet())

public Collection<String> getFieldNames() {
    return _fields.keySet();
}

for (K key : m.keySet()) {
  // do stuff
}

public String toString() {
    return "<State:" + stateType + ":" + table.keySet() + ">";
}

createLinkedHashSet(parameterKeys.values())
```


Comments
Thanks Stuart! > 1) A test that's sensitive to the number of allocations seems exceptionally brittle, and in general we don't make any guarantees about the number of allocations any operation performs. To be clear I completely agree, I shared that in the interests of trying to help put error bars on the compatibility of the change. Anyone who writes a test that asserts on the number of allocations should expect it to be fragile, and to get to keep both pieces when it breaks :) > 2) The example program is the kind of thing we had in mind when we said it would affect "identity-sensitive operations" although this might be difficult to discern if one isn't familiar with the collections APIs. That all makes sense, I just hadn't realized that the view collections didn't implement value equality before now. As you say relying on Collection#equals is a bit dubious in general, but the issues I was familiar with were examples like comparing between different types of Collections like List and Set, and in those cases using Collection#equals to test a collection for equality with itself does work. The specific tests I saw that encountered this were mostly mocking tests that wanted to verify a particular derived view collection was passed as an argument, e.g. `when(foo.doSomething(myMap.values())).thenReturn(...)`. It's definitely possible to fix those by using a matcher that tests if the collections have the same elements instead of relying on `Collection#equals`, the observation is just that there are some examples of that pattern in the wild, and I can see how someone would write that code without realizing they were relying on identity-sensitive observations. I also found some previous discussion about why the derived view collections 'don't just' override equals: https://bugs.openjdk.org/browse/JDK-5038615
02-06-2025

Yes, thanks [~cushon], very interesting. In both cases I think the tests might be making unwarranted assertions. 1) A test that's sensitive to the number of allocations seems exceptionally brittle, and in general we don't make any guarantees about the number of allocations any operation performs. 2) The specification of [Collection::equals][1] is quite loose, and it seems suspicious to me that a test is depending on the result. The specification allows either identity-sensitive comparison (that is, inheriting equals from Object) or a value-based comparison, such as that provided by Set or List or some comparison defined by an application subclass. The [Map::values][2] view doesn't say anything about equality. In practice most JDK implementations inherit from AbstractCollection which in turn inherits equals() from Object, resulting in identity-based comparisons, so this isn't really a surprise. The example program is the kind of thing we had in mind when we said it would affect "identity-sensitive operations" although this might be difficult to discern if one isn't familiar with the collections APIs. However, it seems unlikely that a test really wants to check for identity of the values view, and instead it probably wants to ensure that the values view contains certain elements. To continue with the example program, the following might be reasonable: assertTrue(m.values().containsAll(List.of("one", "two"))); Or it could use some other technique that explicitly checks the elements of the collection. Note that in general a Map's values collection is neither a List nor a Set, so testing for equality with either of those will likely (but is not guaranteed!) return false. Without looking at the tests, though, it's hard to be sure, but I'm quite suspicous of what these tests are doing. [1]: https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/util/Collection.html#equals%28java.lang.Object%29 [2]: https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/util/Map.html#values%28%29
01-06-2025

Thanks for reporting the analysis [~cushon].
31-05-2025

I did an analysis of Google's codebase where I applied https://github.com/openjdk/jdk/pull/15614 and ran all of our tests. I found about ten breakages. There were two main categories: 1. some tests were tracking the number of allocations, and there were fewer allocations after this change, I think those tests weren't using the view collections and benefited from the removed field 2. some tests were using #equals to test derived view collections, and those operations now fail For the second category, the CSR discusses that operations involving identity-sensitive operations on derived map objects may fail, but it wasn't immediately obvious to me that because these derived view collections don't implement equals those comparisons will also fail. Is that expected? e.g. the behaviour of this example is affected: import java.util.HashMap; import java.util.Collection; class T { public static void main(String[] args) { HashMap<Integer, String> m = new HashMap<>(); m.put(1, "one"); m.put(2, "two"); Collection<String> a = m.values(); Collection<String> b = m.values(); System.err.println(a.equals(b)); } }
31-05-2025

Yes, this is not a YOLO / Leeroy Jenkins sort of situation when it comes to accessing the compatibility impact. We have the option of leaving the current code in place if we cannot get small enough error bars on the estimated compatibility impact.
30-05-2025

We should do a survey to see what kinds of usage actual code makes with the AbstractMap views. [~pminborg] and I can do this.
30-05-2025

I have reviewed this CSR as a change for 26. Racy reads can return multiple equivalent instances, which will be similar to what we will have after the removal. In addition, we would anticipate the same incompatibility when value objects arrive. Thus, I believe the best approach is to apply this change in 26 and consider rolling back if there are reports that this causes issues.
30-05-2025

This CSR has been in Provisional state for some time. A reminder that JDK 25 rampdown 1 start is on June 5, 2025. CSRs intended to get in the release should be Finalized well ahead of the deadline. If there is no longer interest in this CSR, it can be withdrawn.
22-04-2025

Hmmm. I'm moving this CSR to Provisional to indicate I think it is worth further investigation. Please work with [~smarks] and other collections maintainers before the CSR to Finalized to further consider implications and compatibility, etc.
20-10-2023

Moving back to Draft. [~pminborg], my initial read of this CSR is that a much more extensive compatibility discussion will be needed to evaluate the change.
20-09-2023