JDK-8299339 : HashMap merge and compute methods can cause odd resizing pathologies
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util:collections
  • Affected Version: 8,11,17,19,20
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • OS: generic
  • CPU: generic
  • Submitted: 2022-12-22
  • Updated: 2023-01-08
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Description
A DESCRIPTION OF THE PROBLEM :
Since the merge method will not call the resize method when the ++size exceeds the threshold, if the merge method is called in the forEach method at this time, the merge method will call the resize method, which will cause the forEach method to behave abnormally.


---------- BEGIN SOURCE ----------
public class Main {
    public static void main(String[] args) {
        Map<Integer, Integer> map = new HashMap<>(2);

        map.merge(1, 1, Integer::sum);
        map.merge(2, 1, Integer::sum);

        map.forEach((k, v) -> {
            map.merge(k, -1, Integer::sum);
            System.out.println(k);
        });
    }
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Same as the putVal method:
if (++size > threshold) resize();


Comments
I was surprised to see HashMap methods incrementing size without resizing the array. Seems like a simple bug. I don't recall having worked on these methods myself. We already have test/jdk/java/util/HashMap/WhiteBoxResizeTest.java but it doesn't test these methods.
08-01-2023

[~martin] Any ideas about why Map.compute/computeIfAbsent/merge have a slightly different resizing threshold policy than other methods?
04-01-2023

It seems unwise to modify the map while running forEach() over it. Even though in this case the code is not making a structural modification to the map (that is, not adding or removing entries) it's possible that calling methods on the map will have side effects that might spoil the iteration. For example, while iterating a WeakHashMap, calling a method on that map might cause stale entries to be removed, spoiling the iteration. Future optimizations to HashMap, such as converting to/from treebins, might also spoil an iteration currently in progress. Thus I'd advise rewriting code to avoid such situations. The specification of Map::forEach should probably be updated with a general caution about calling Map methods, especially those with side effects, during the iteration. That said, HashMap::merge does seem to misbehave in that it will not resize the map even if the call results in the addition of an entry and the new size exceeds the threshold. This is in contrast to HashMap::put, which will resize the map in this case. Also, HashMap::merge will resize the map if it detects that its size exceeds the threshold, even if the actual merge operation doesn't do any resizing. This is the scenario that the submitter has noticed. The pathologies also apply to the compute() and computeIfAbsent() methods, but not computeIfPresent(). Consider this scenario: var map = new HashMap<Integer, Integer>(2) map.put(1, 1) // size=1 threshold=1 table=[null, 1=1] map.put(2, 2) // size=2 threshold=3 table=[null, 1=1, 2=2, null] However, var map = new HashMap<Integer, Integer>(2) map.put(1, 1) // size=1 threshold=1 table=[null, 1=1] map.merge(2, 2, Integer::sum) // size=2 threshold=1 table=[2=2, 1=1] <=== size exceeds threshold map.merge(2, 3, Integer::sum) // size=2 threshold=3 table=[null, 1=1, 2=5, null] This also seems to occur with Map::compute: var map = new HashMap<Integer, Integer>(2) map.put(1, 1) // size=1 threshold=1 table=[null, 1=1] map.compute(2, (k, v) -> 2) // size=2 threshold=1 table=[2=2, 1=1] <=== size exceeds threshold map.compute(2, (k, v) -> 3) // size=2 threshold=3 table=[null, 1=1, 2=3, null] and also with Map::computeIfAbsent, which resizes the map even if it doesn't otherwise change the map: var map = new HashMap<Integer, Integer>(2) map.put(1, 1) // size=1 threshold=1 table=[null, 1=1] map.computeIfAbsent(2, k -> 2) // size=2 threshold=1 table=[2=2, 1=1] <=== size exceeds threshold map.computeIfAbsent(2, k -> 3) size=2 threshold=3 table=[null, 1=1, 2=2, null] It's not clear to me whether the similarity in behaviors among compute, computeIfAbsent, and merge an accident as a result of code copying or whether there is some other reason it was done this way. Another pathology is exhibited by a standard iteration instead of forEach(): var map = new HashMap<Integer, Integer>(2) map.put(1, 1) map.merge(2, 2, Integer::sum) // the map is now in an over-threshold state for (var e : map.entrySet()) { map.computeIfAbsent(e.getKey(), k -> k); System.out.println(e); } 2=2 1=1 2=2 The computeIfAbsent() call has the side effect of resizing the map, even though it otherwise doesn't change the contents of the map.
04-01-2023

The observations on Windows 10: JDK 8: Failed, the output is 2. JDK 11: Failed. JDK 17: Failed. JDK 19: Failed. JDK 20ea+23: Failed.
24-12-2022

Additional information from the submitter: The point is not the use of Integer::sum, but the use of the merge method in a special case will cause the output of HashMap's forEach method to be incorrect, but no exception will be thrown! Whereas if I use putVal method there will be no problem. This is a problem caused by the code of the merge method. Next, I will start with a simple example, and then introduce the scene where I actually encountered this problem. 1. Simple example This is code devised to reproduce the problem, without any real meaning. ```java public class Main { public static void main(String[] args) { Map<Integer, Integer> map = new HashMap<>(2); map. merge(1, 1, Integer::sum); map. merge(2, 1, Integer::sum); map.forEach((k, v) -> { map. merge(k, -1, Integer::sum); System.out.println(k); }); } } ``` This code uses the forEach method at the end to traverse the HashMap, but actually ends after traversing the first element, which makes the forEach method not behave as expected, but does not throw any exceptions. We can actually run it and see that the output is "2". But in fact, `map` should be completely traversed, and the output should be "2 1" or "1 2". Why is this happening to cause forEach to end after traversing the first element? Because the number of elements in `map` is already 2 after `map.merge(2, 1, Integer::sum)`, then the capacity of `map` is also 2. Obviously, the number of elements in `map` exceeds the threshold but the resize method is not called, because the merge method does not judge whether it should be expanded when `++size`, which will result in expansion when the merge method is called next time. If the next time the merge method is called in the forEach method (just like the above code), unexpected behavior will occur. This problem does not occur when I use the putVal method instead of the merge method. Let's compare a difference between the merge method and the putVal method: ```java public V merge(K key, V value, BiFunction<? super V, ? super V, ? extends V> remappingFunction) { //... ++modCount; ++size; afterNodeInsertion(true); return value; } final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { //... ++modCount; if (++size > threshold) resize(); afterNodeInsertion(evict); return null; } ``` As shown above, this is why using the merge method in special cases will cause the code to behave incorrectly, because the merge method judges whether the capacity should be expanded at the beginning of the method, while the putVal method judges at the end of the method. 2. Actual scene I encountered this problem when implementing the Moore voting algorithm (https://leetcode.com/problems/majority-element-ii/description/). I use HashMap to store `k` candidates and their number of votes. When the candidate is not in the HashMap, the number of votes of all the candidates will be reduced by one. If the candidate exists in the HashMap, the number of votes will be increased by one. The detailed code is as follows: ```java class Solution { public List<Integer> majorityElement(int[] nums) { int n = nums. length, c = 3; List<Integer> list = new ArrayList<>(); Map<Integer, Integer> map = new HashMap<>(c - 1); for (int x : nums) { // x is a candidate || candidates are not full if (map. containsKey(x) || map. size() != c - 1) { map. merge(x, 1, Integer::sum); continue; } // Find candidates with zero votes and replace boolean isExist = false; Set<Map.Entry<Integer, Integer>> entries = map.entrySet(); for (var entry : entries) { if (entry. getValue() == 0) { map. remove(entry. getKey()); map. put(x, 1); isExist = true; break; } } if (isExist) continue; // The number of votes for all candidates is reduced by one map.forEach((k, v) -> map.merge(k, -1, Integer::sum)); } // Count the votes of the candidates map.forEach((k, v) -> map.put(k, 0)); for (int x : nums) if (map. containsKey(x)) map. merge(x, 1, Integer::sum); map.forEach((k, v) -> { if (v > n / c) list.add(k); }); return list; } } ``` If the input parameter `nums = new int[]{4,5,3,4,4,1,0,-1,-2,4,6,7,8,4}`, the result of the above code is "[]". And if I don't set the initial capacity of HashMap, the result is "[4]". How strange that the initial capacity of the HashMap should change the result of the code! Finally, although the merge method exhibits problems only in special use cases, the code misbehaves without throwing any exceptions, which I think should probably be fixed.
24-12-2022

Requested the submitter provide the expected and actual results.
23-12-2022