JDK-7143928 : (coll) Optimize for Empty HashMaps
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.util:collections
  • Affected Version: 7
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2012-02-08
  • Updated: 2013-08-15
  • Resolved: 2013-04-02
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8
8 b86Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Description
I am filing this on behalf of Misha Dmitriev.

There are two proposols for the optimization.  The first proposal is to leave the table == null.  Then add checks for null and respond appropriately.  The second proposal is to create a static final zero-length array.  When a HashMap is created, its table is set to use this empty array.  Then, at the time of the first put(), the table will be resized.

Either way, we can avoid memory waste by empty HashMaps (not completely, but at least a considerable part of waste due to the empty internal array will be gone).

Oops, I realize that this bug needs to be updated. In particular, lazyinithashmap.zip that Nathan mentions in his request, does not exist anymore.  So below is what I once wrote on this topic, with correct links and some numbers proving the feasibility of this change.

---------------------------------------------------

I took java.util.HashMap from the most current version of JDK8, modified it to initialize lazily and compared performance of the two implementations. They are on par, see below.

The jar containing all the classes (including Doug Lea's MapMicroBenchmark) and the sources for both the original HashMap (now called StandardHashMapJdk8.java to avoid possible clashes with changes to the standard HashMap in the future) and the modified HashMap (LazyInitHashMapJdk8.java) can be found at http://sunlabs.us.oracle.com/people/misha/mapbm.jar

In LazyInitHashMapJdk8.java, all modified/added code pieces are preceded with '// ***' so that they can be found easily. It is unpolished, and hopefully can be made more elegant if the whole idea is considered worthwhile.

Please let me know if I should do anything else to assist with this.

Here are the performance numbers, using Doug Lea's MapMicroBenchmark, for the standard and lazily-initializing HashMap implementations:

C:\count\mapbm>\java\jdk1.8.0\bin\java -server -Xbootclasspath/a:mapbm.jar java.
util.MapMicroBenchmark java.util.StandardHashMapJdk8
Class java.util.StandardHashMapJdk8 randomized searches
No word file. Using String.valueOf(i)
warm up.........................................................................
.................
warm up.........................................................................
.................
running.........................................................................
.................
Type/Size:      9     36    144    576   2304   9216  36864 147456 589824
Object         16     15     15     15     19     21     25     39     87
String         15     15     14     16     17     25     28     69    101
Integer        15     13     12     12     13     15     18     34     69
Long           16     15     14     14     15     17     22     38     77
Float          18     15     16     17     18     19     27     43     92
Double         17     17     16     15     20     21     23     45     87
BigInteger     19     19     17     18     19     22     26     63     97
BigDecimal     18     19     19     18     22     29     28     70    104
RandomInt      16     15     16     15     19     23     27     51     98
Mixed          23     24     25     29     34     38     48     90    121

average        17     16     16     16     19     23     27     54     93


C:\count\mapbm>\java\jdk1.8.0\bin\java -server -Xbootclasspath/a:mapbm.jar java.
util.MapMicroBenchmark java.util.LazyInitHashMapJdk8
Class java.util.LazyInitHashMapJdk8 randomized searches
No word file. Using String.valueOf(i)
warm up.........................................................................
.................
warm up.........................................................................
.................
running.........................................................................
.................
Type/Size:      9     36    144    576   2304   9216  36864 147456 589824
Object         15     15     15     15     18     21     25     39     89
String         15     16     14     16     17     25     28     68    102
Integer        15     13     13     12     13     14     18     34     70
Long           17     16     15     15     15     18     22     39     78
Float          17     15     16     17     18     19     27     42     92
Double         17     17     16     15     20     21     23     44     87
BigInteger     19     18     17     18     20     23     27     69    103
BigDecimal     18     19     19     18     22     29     28     70    104
RandomInt      17     16     16     16     19     22     27     48     93
Mixed          21     24     25     30     34     39     48     91    126

average        17     16     16     17     19     23     27     54     94
I have attached the zip file with implementations of lazy initialized HashMap and ArrayList. These classes have been tested with ADF Fusion Order Demo  and ATG CRM Demo  applications. The heap reduction has been 8% and 13% respectively. Average response times decreased 19% and 16%.

Comments
My comment on JDK-8003586 proposes some slight adjustments to http://hg.openjdk.java.net/jdk8/jdk8/jdk/rev/0cccdb9a9a4c . https://jbs.oracle.com/bugs/browse/JDK-8003586?focusedCommentId=13318659&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13318659
25-04-2013

In the the Fusion CRM Sales Opportunities application ~85% of empty ArrayList objects were initialized with default array size 10
28-03-2013

There has been some pushback on the ArrayList changes because of the additional field for initialCapacity. One suggestion is to use the zero length array only if the array is created at default size. Do we have any analysis of requested initial size for empty arrays? Are most of them default size?
27-03-2013

We used zero-length in ArrayList and null in HashMap in POC implementation for empty collection cases. Misha originally used zero-length array in his changed HashMap's version but there was an issue in the serialization part when it was used with reflection and we replaced zero-length with null to make it working
25-03-2013

I recommend an empty array rather than null as a sentinel value for two reasons: 1. The JVM prefers to merge null checks into load or store instructions (so-called "implicit null checks") because it removes an explicit branch. But it only does so if the probability of nulls is zero or very low. But using null as a sentinel for common states (e.g., empty collection) defeats this optimization. 2. For power-of-two sized structures (HashMap) we can optimize away an array range check in the presence of a zero-length check. See https://jbs.oracle.com/bugs/browse/JDK-8003585 . Since most uses of a variable-sized collection load and test the array length, the sentinel check can easily be overloaded onto this test. If null is not used, then the (safety-mandated) null check is (usually) merged into the load of the length. If the table is power-of-two-sized, then only the zero check remains, and the array range check may be removed. This is thought to be the best code for a frequent load from a possibly-empty collection. Mike asked, "what about empty collection?" This is a reasonable thing to use, but it has a cost. The JVM uses inline caches and type profiles to simplify its optimized code; these techniques "win" when at a given use point (individual call to Map.get, for example) there is only one class present, even if the interface is totally general. (This is the so-called "monomorphic" case.) If the application uses (say) only HashMaps for both empty and non-empty maps, then this optimization can win big. It can be broken, on the other hand, if the application begins to use one other type for some other case (such as empty maps). In these cases, it is better to overload the "am I empty?" test on some other loaded value, such as a null or (better) an array length.
25-03-2013

Related JVM JIT optimization which favors "zero length sentinels".
25-03-2013

Patch proposed. We need to determine release path to make forward progress.
01-03-2013

I have since tested the patch and it passes the standard regression tests.
01-03-2013

I have added a rebased version of the patch against current JDK8/TL repository. I have not had a chance to test it. For the moment the conceptual review is more important.
26-02-2013

Updated patch against current jdk8/tl repo
26-02-2013

sources of lazy initialized HashMap and ArrayList
12-12-2012

Test results from the comparative runs of Fusion CRM SalesOpportunity application with and without lazy initialized ArrayList and HashMap is attached.
08-12-2012

John Rose proposes using empty array rather than null as sentinel to simplify logic and improve performance in JDK-8003586 I would like to see a comparison of the performance.
17-11-2012

EVALUATION moved to comment
09-02-2012