JDK-8214427 : probable bug in logic of ConcurrentHashMap.addCount()
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.concurrent
  • Affected Version: 11,12
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2018-11-26
  • Updated: 2020-04-09
  • Resolved: 2018-12-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 12
12 b24Fixed
Related Reports
Relates :  
Relates :  
Description
A DESCRIPTION OF THE PROBLEM :
if (check >= 0) {
            Node<K,V>[] tab, nt; int n, sc;
            while (s >= (long)(sc = sizeCtl) && (tab = table) != null &&
                   (n = tab.length) < MAXIMUM_CAPACITY) {
                int rs = resizeStamp(n);
                if (sc < 0) {
                    if ((sc >>> RESIZE_STAMP_SHIFT) != rs || sc == rs + 1 ||
                        sc == rs + MAX_RESIZERS || (nt = nextTable) == null ||
                        transferIndex <= 0)
                        break;
                    if (U.compareAndSetInt(this, SIZECTL, sc, sc + 1))
                        transfer(tab, nt);
                }
                else if (U.compareAndSetInt(this, SIZECTL, sc,
                                             (rs << RESIZE_STAMP_SHIFT) + 2))
                    transfer(tab, null);
                s = sumCount();
            }
        }

In the above code, condition of     (sc == rs + 1 ||  sc == rs + MAX_RESIZERS )  would never be true , since the value of rs is positive and the value of sc is negative . 

The correct condition should be (sc >>> RESIZE_STAMP_SHIFT) == rs + 1 ||  (sc >>> RESIZE_STAMP_SHIFT) == rs + MAX_RESIZERS,  which can be used to dedect if resizing process finished or resizing threads reaches maxmium limitation 

if ((sc >>> RESIZE_STAMP_SHIFT) != rs || sc == rs + 1 ||
                        sc == rs + MAX_RESIZERS || (nt = nextTable) == null ||
                        transferIndex <= 0)


FREQUENCY : always


Edited to correct apparent typo: it stated "... the value of rs is negative" but it is sc that is negative.
Comments
A similar change is also necessary in helpTransfer. These together with a simplification of the shift expressions are now in jsr166 repo. I also verified that limits are maintained.
02-12-2018

Additional Information from submitter: I need to change the correct conditions given by me in the bug description ������ The correct condition shuold be sc == ( rs<<<RESIZE_STAMP_SHIFT ) +1 || sc == ( rs<<<RESIZE_STAMP_SHIFT ) + MAX_RESIZERS which can be used to detect if resizing process finished or resizing threads number reaches the maximum limit This bug could be verifed by a small example : First, copy the ConcurrentHashMap source code to your own package Second, do some necessary modification to make it can be compiled (eg: change package declaration, make Unsafe instance works, copy ThreadLocalRandom to your package as well, since the ConcurrentHashMap used ThreadLocalRandom.probe() function, which is not public ) Third, reduce MAX_RESIZERS to 2, as the documentation shows, this should ensure there are at most 2 threads can do resizing concurrently private static final int MAX_RESIZERS = 2; Fourth, add the following code snippet into the customized ConcurrentHashMap class public static void main(String[] args) { ConcurrentHashMap hashMap = new ConcurrentHashMap(8); for(int i = 0; i< 300; i++) { new Thread() { @Override public void run() { hashMap.put(Thread.currentThread().getId(),"id: "+Thread.currentThread().getId()); } }.start(); } } Fifth, add the following code snippet into the transfer function of ConcurrentHashMap . To suspend any thread that entered into transfer if (nextTab == null) { // initiating try { @SuppressWarnings("unchecked") Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n << 1]; nextTab = nt; } catch (Throwable ex) { // try to cope with OOME sizeCtl = Integer.MAX_VALUE; return; } nextTable = nextTab; transferIndex = n; } // The following added code here is to suspend Threads !!!! try { String s = new String(); synchronized (s) { s.wait(); } } catch (InterruptedException e) { e.printStackTrace(); } Six, add the Thread break point in the following code line in addCount function ( Tip: I used Idea Intellij , choose "Thread" option can suspend each thread in your application , otherwise it will only suspend only the first Thread which executed to the break point) if (U.compareAndSwapInt(this, SIZECTL, sc, sc + 1)) transfer(tab, nt); enter image description here Then run the main function, you will see more than 2 threads entered transfer function, which means MAX_RESIZERS does not take any effect.
29-11-2018

Yes. Some of this check now includes dead code, because of a change of representation at one point that wasn't adjusted for. With the possible effect of some threads not helping resize (a performance, not map correctness bug) This should be fixed (and is committed in jdr166 repo): --- ConcurrentHashMap.java.~1.314.~ 2018-10-05 13:42:39.860409607 -0400 +++ ConcurrentHashMap.java 2018-11-28 18:48:55.998082379 -0500 @@ -2307,9 +2307,9 @@ (n = tab.length) < MAXIMUM_CAPACITY) { int rs = resizeStamp(n); if (sc < 0) { - if ((sc >>> RESIZE_STAMP_SHIFT) != rs || sc == rs + 1 || - sc == rs + MAX_RESIZERS || (nt = nextTable) == null || - transferIndex <= 0) + if ((sc >>> RESIZE_STAMP_SHIFT) == rs + 1 || + (sc >>> RESIZE_STAMP_SHIFT) == rs + MAX_RESIZERS || + (nt = nextTable) == null || transferIndex <= 0) break; if (U.compareAndSetInt(this, SIZECTL, sc, sc + 1)) transfer(tab, nt);
28-11-2018

>> What is your analysis just based on the code and the report? It certainly appears incorrect to me. I stared at the code for a while, but not long enough to understand it. Doug will remember!
28-11-2018

What is your analysis just based on the code and the report? It certainly appears incorrect to me.
28-11-2018

Resizing the internal bucket array is hairy race-prone code, and hard to stress test because resizes are relatively rare. The reporter probably investigated an actual occurrence of incorrect count (we could ask!), but we don't have a stress test reproduction that could be used. One should be able to construct a stress test using multiple threads to trigger concurrent attempts to addCount, but it won't be easy.
28-11-2018

Martin, can you take a look at this?
28-11-2018