JDK-8046170 : JEP 180: Handle Frequent HashMap Collisions with Balanced Trees
  • Type: JEP
  • Component: core-libs
  • Priority: P4
  • Status: Closed
  • Resolution: Delivered
  • Fix Versions: 8
  • Submitted: 2013-02-08
  • Updated: 2017-06-14
  • Resolved: 2017-06-14
Related Reports
Relates :  
Description
Summary
-------

Improve the performance of `java.util.HashMap` under high hash-collision
conditions by using balanced trees rather than linked lists to store map
entries.  Implement the same improvement in the `LinkedHashMap` class.


Motivation
----------

Earlier work in this area in JDK 8, namely the
[alternative string-hashing implementation][7126277], improved collision
performance for string-valued keys only, and it did so at the cost of
adding a new (private) field to every `String` instance.

The changes proposed here will improve collision performance for any key
type that implements `Comparable`.  The alternative string-hashing
mechanism, including the private `hash32` field added to the `String`
class, can then be removed.

[7126277]: http://hg.openjdk.java.net/jdk8/jdk8/jdk/rev/43bd5ee0205e


Description
-----------

The principal idea is that once the number of items in a hash bucket
grows beyond a certain threshold, that bucket will switch from using a
linked list of entries to a balanced tree.  In the case of high hash
collisions, this will improve worst-case performance from O(n) to O(log
n).

This technique has already been implemented in the latest version of the
`java.util.concurrent.ConcurrentHashMap` class, which is also slated for
inclusion in JDK 8 as part of [JEP 155](JDK-8046145).  Portions of that code will
be re-used to implement the same idea in the `HashMap` and `LinkedHashMap`
classes.  Only the implementations will be changed; no
interfaces or specifications will be modified.  Some user-visible
behaviors, such as iteration order, will change within the bounds of
their current specifications.

We will not implement this technique in the legacy `Hashtable` class.
That class has been part of the platform since Java 1.0, and some legacy
code that uses it is known to depend upon iteration order.  `Hashtable`
will be reverted to its state prior to the introduction of the
alternative string-hashing implementation, and will maintain its
historical iteration order.

We also will not implement this technique in WeakHashMap.  An attempt was
made, but the complexity of having to account for weak keys resulted in an
unacceptable drop in microbenchmark performance.  `WeakHashMap` will also
be reverted to its prior state.

There is no need to implement this technique in the `IdentityHashMap`
class.  It uses `System.identityHashCode()` to generate hash codes, so
collisions are generally rare.


Testing
-------

  - Run `Map` tests from Doug Lea's JSR 166 CVS workspace (includes a couple
    microbenchmarks)
  - Run performance tests of standard workloads
  - Possibly develop new microbenchmarks


Risks and Assumptions
---------------------

This change will introduce some overhead for the addition and management
of the balanced trees; we expect that overhead to be negligible.

This change will likely result in a change to the iteration order of the
`HashMap` class.  The `HashMap` specification
explicitly makes no guarantee  about iteration order.  The iteration order of
the `LinkedHashMap` class will be maintained.