Bug ID: JDK-8368292 Severe performance penalty in bulk Collections methods (putAll, addAll, .<init>(Collection))

JDK-8368292 : Severe performance penalty in bulk Collections methods (putAll, addAll, .(Collection))

Type: Enhancement
Component: hotspot
Sub-Component: compiler
Affected Version: 26

Priority: P4
Status: Open
Resolution: Unresolved

Submitted: 2025-09-22
Updated: 2025-11-03

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

Other
tbdUnresolved

Related Reports

Relates :	JDK-8015416 - tier one should collect context-dependent split profiles
Relates :	JDK-8371164 - ArrayList.addAll() optimizations
Relates :	JDK-8015417 - profile pollution after call through invokestatic to shared code

Description

In most cases, an explicit loop with single-element operations is substantially faster than bulk methods that process multiple elements. For example:

new HashMap<>(myMonomorphicMap)

is up to 57% slower than:

Map<K, V> myNewMap = new HashMap<>((int)(myMonomorphicMap.size() * 1.35));
for (Entry<K, V> entry : myMonomorphicMap.entrySet()) {
 myNewMap.put(entry.getKey(), entry.getValue());
}

Evidence:
JMH benchmarks on both aarch64 (AWS c6g) and x64 (AWS c6a) demonstrate significant performance differences:

• **aarch64**: Manual inlining shows 24-94% performance improvement over HashMap constructor
• **x64**: Manual inlining shows 65-136% performance improvement over HashMap constructor
• **Memory efficiency**: Manual approach uses 1-7% less memory per operation for non-empty maps
• **Edge case**: Empty maps show 10-23% regression with manual approach

Performance gains scale with map size, with the largest improvements occurring for maps with 100+ elements.

The attached JMH test HashMapConstructorBenchmark.java demonstrates the effect on HashMap.<init>(Map) and the JMH test CollectionPolymorphismBenchmark.java demonstrates the effect on a wider selection of similar methods.

Explanation:
HashMap.<init>(Map) is inherently megamorphic (HashMap, TreeMap, LinkedHashMap, Collections.SingletonMap, etc.), meaning locally monomorphic usage still suffers from global megamorphism. The JIT cannot eliminate the virtual method lookups in these unusual concentrations of virtual methods: Map.entrySet(), Set.iterator(), Iterator.hasNext(), Iterator.next(), Entry.getKey(), Entry.getValue(). Put another way, (3 + 4n) virtual method calls are required to iterate across input Maps. List iteration is somewhat better with (2 + 2n).

Impact:
The microbenchmarks show performance improvements of 24-136% for the manual approach depending on architecture and map size. Separate analysis of real-world application hotspots shows performance gains of 16-75% when these bulk methods are manually rewritten in application code.

Scope:
Profiling analysis of 8 known but unsolved application hotspots shows that 7 included these methods, including one in Tomcat
(https://bz.apache.org/bugzilla/show_bug.cgi?id=69820) and one in Log4j (https://github.com/apache/logging-log4j2/issues/3935).  Additional slow methods have been found in other custom data structures, such as Guava or proprietary libraries.

Solution:
No great idea, but here are some conversation starters:
1. Manually identify and rewrite all hotspots.  A search of our profiling data for several applications shows that this problem is in our libraries
more often than our application, so the surface area and cost of fixing are quite high.
2. Modify javac to rewrite calls to known methods, similar to how String concatenation is under-the-covers magic.  This implicitly increases compiled
code size, violates spec, and requires all libraries to be recompiled and published.
3. Develop JIT techniques to create specialized versions of bulk methods for common receiver type combinations, reducing virtual call overhead.

Comments

Added a benchmark for HashMap.hashCode() and equals() showing explicit loops are 47-162% faster for hashCode(), and 10% faster for equals().
14-10-2025
I found quite a few affected methods in the core JDK - highlights are HashMap.hashCode() and .equals(), which are inherited from AbstractMap and globally megamorphic. Similar for LinkedList, which inherits them from AbstractList. These are opportunities to rewrite JDK code to avoid the underlying issue, and could easily be backported to help existing JDKs... but my current data shows low magnitude for my applications. Definitely greater than zero.
14-10-2025
This is as real-world as it gets, with broad application: 1. It was discovered by chasing a production bottleneck. I couldn't reproduce in microbenchmarks, investigated the discrepancy, voila. (over-simplified) 2. The hotspot in Tomcat is present in all of our many JSP-based web applications, and based on my understanding, will exist in every Tomcat JSP-based web app. 3. The hotspot in Log4j is present on all Log4j-based web applications and some non-web, although behavior can differ based on configuration. 4. Several of the Builder patterns in Guava have the same vulnerability as HashMap, although with a variety of call paths that may or may not be affected (just like HashMap) 5. Since reporting the issue, I've continued to investigate the scope, and begun to apply workarounds for hotspots. The two highest-priority applications show ~1% cpu and ~1.5% respectively, dealing solely with HashMap and the Guava calls, and pre-production testing on the first shows a 5% reduction in object allocation from the now-avoided HashIterators. In both applications, these changes improve the critical path latency, and will significantly reduce our operating costs. 6. Depending on details, the split profiling approach may also help in complex object hierarchies, such as AstNodes in Tomcat's EL interpreter. At some point I'm going to investigate the applicability to GraalJS. > Perhaps, in such cases, the JVM should occasionally discard stale profiling data, recollect it and then recompile the method, hoping that optimistic assumptions may again hold true. This wouldn't work in most cases - the typical behavior is competing profiles. The JSP engine uses A while the JVM uses B and the application code uses C, D, E. None of these profiles is stale and no single set of assumptions is correct. A solution that aims for one specific "best" optimization will overall be inferior to widespread explicit loops.
10-10-2025
Hi John. Could you please clarify whether your synthetic benchmark represents a realistic scenario where outdated or misleading profiling information actually hurts performance in steady-state workloads? If so, it might indeed suggest that something needs to be addressed. Perhaps, in such cases, the JVM should occasionally discard stale profiling data, recollect it and then recompile the method, hoping that optimistic assumptions may again hold true. Alternatively, the compiler could compare the current call-site type profile with the one used during compilation, and avoid calling a compiled method if the actual argument type no longer matches the optimistic specialization. In some specific cases, we might even be able to hint to the compiler from Java code that a monomorphic call is expected (see attached HashMap.diff).
09-10-2025
To summarize the conversation so far - corrections are welcome: 1. The report and explanation (JIT mis-training) are confirmed. 2. Split-profiling in the JIT is the solution being considered. 3. This is an enhancement opportunity rather than a bug (I agree) Split profiling should solve this problem, and may go further and optimize away the megamorphic calls to Object.hashCode() and Object.equals(). It'll also improve calls to utility classes/methods within our applications. That said, it does not look simple or quick, so I assume it'll be JDK26 at the earliest w/ no backporting. Is that about right?
30-09-2025
Right. If you trained the dog to be a good shepherd, don’t expect it to help you hunt :) The benchmark “trains” HotSpot’s collection code with one type profile and then measures it under a different one. Here’s performance on my machine (Apple M4) with no training (no_poison) vs poisoned training vs manual copy. ``` Benchmark (mapSize) Mode Cnt Score Error Units hashMapConstructor_HashMap_no_poison 0 avgt 5 2.083 ± 0.214 ns/op hashMapConstructor_HashMap_no_poison 5 avgt 5 22.704 ± 1.082 ns/op hashMapConstructor_HashMap_no_poison 75 avgt 5 388.269 ± 3.129 ns/op hashMapConstructor_HashMap_no_poison 1000 avgt 5 7379.205 ± 213.376 ns/op hashMapConstructor_HashMap 0 avgt 5 2.093 ± 0.036 ns/op hashMapConstructor_HashMap 5 avgt 5 48.625 ± 0.375 ns/op hashMapConstructor_HashMap 75 avgt 5 599.131 ± 15.201 ns/op hashMapConstructor_HashMap 1000 avgt 5 12076.310 ± 2340.370 ns/op manualInlining_HashMap 0 avgt 5 2.134 ± 0.180 ns/op manualInlining_HashMap 5 avgt 5 35.525 ± 0.345 ns/op manualInlining_HashMap 75 avgt 5 477.533 ± 16.265 ns/op manualInlining_HashMap 1000 avgt 5 7483.601 ± 112.645 ns/op ``` Conclusion: the gap is indeed a type-profile artifact. Also, please note that chosen capacity affects performance: @Benchmark public HashMap<String, Integer> manualInlining_HashMap(Blackhole bh) { HashMap<String, Integer> result = new HashMap<>((int) (inputHashMap.size() * CAPACITY_FACTOR)); for (Map.Entry<String, Integer> entry : inputHashMap.entrySet()) { result.put(entry.getKey(), entry.getValue()); } bh.consume(result); return result; } @Benchmark public HashMap<String, Integer> manualInlining_HashMap_defaultCapacity(Blackhole bh) { int cap = 1 << (32 - Integer.numberOfLeadingZeros((int)Math.ceil(inputHashMap.size() / 0.75) - 1)); HashMap<String,Integer> result = new HashMap<>(cap); for (Map.Entry<String, Integer> entry : inputHashMap.entrySet()) { result.put(entry.getKey(), entry.getValue()); } bh.consume(result); return result; } Benchmark (mapSize) Mode Cnt Score Error Units manualInlining_HashMap 5 avgt 5 35.525 ± 0.345 ns/op manualInlining_HashMap 75 avgt 5 477.533 ± 16.265 ns/op manualInlining_HashMap 1000 avgt 5 7483.601 ± 112.645 ns/op manualInlining_HashMap_defaultCapacity 5 avgt 5 40.108 ± 1.045 ns/op manualInlining_HashMap_defaultCapacity 75 avgt 5 541.820 ± 4.575 ns/op manualInlining_HashMap_defaultCapacity 1000 avgt 5 8721.644 ± 158.320 ns/op
25-09-2025
Since it's always been like that it feels more like a performance enhancement. Converted to RFE.
24-09-2025
I verified on applications running 17 and 21; I verified the JMH benchmark on 8, 17, 21, 25. This behavior is present EVERYWHERE. Reviewing old documentation, it looks like the *morphism behavior was defined from the very introduction of Collections, in 1.2 in 1998.
23-09-2025
Hi [~jengebretson], thanks for the report. Have you checked whether older versions are also affected or in other words if a particular JDK version introduced the performance drop?
23-09-2025
I recall there has been a similar discussion around Objects::equals versus the Object::equals resolved against specific instances which comes down to the same issue where you can't split profiles (which happens by bytecode). Similar discussion for field versus array element profiling.
22-09-2025