JDK-8187408 : AbstractQueuedSynchronizer wait queue corrupted when thread awaits without holding the lock
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.concurrent
  • Affected Version: 8,9
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2017-09-04
  • Updated: 2018-01-31
  • Resolved: 2017-10-03
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 10 JDK 9
10 b26Fixed 9u-openFixed
Description
FULL PRODUCT VERSION :
java version "1.8.0_112"
Java(TM) SE Runtime Environment (build 1.8.0_112-b16)
Java HotSpot(TM) 64-Bit Server VM (build 25.112-b16, mixed mode)

ADDITIONAL OS VERSION INFORMATION :
Darwin 192.168.0.104 16.6.0 Darwin Kernel Version 16.6.0: Fri Apr 14 16:21:16 PDT 2017; root:xnu-3789.60.24~6/RELEASE_X86_64 x86_64

A DESCRIPTION OF THE PROBLEM :
200 threads invoke AbstractQueuedSynchronizer.ConditionObject#await() concurrently. 
100 threads which hold the exclusive lock succeed while remain ones without holding the  lock fail and get IllegalMonitorStateException.
After all 100 threads release the exclusive lock, the main thread will continuously invoke signalAll() until all waiting threads are waken.

One possible result is that the main thread will hang which means some wait threads will NEVER be waken.


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the test program with assertion enabled.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The given test program should be able to end.
All waiting threads should be eventually signaled regardless of whether there were failing attempts at invoking awaiting().
ACTUAL -
The main thread failed to signal all waiting threads and the program will never end.
Although all threads without holding the exclusive lock failed and got IllegalMonitorStateException, these failing attempts did make some side effects which will cause the ConditionObject not to work.



REPRODUCIBILITY :
This bug can be reproduced often.

---------- BEGIN SOURCE ----------
import java.util.concurrent.locks.Condition;
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantLock;

public class ConditionObjectTest {

    // to record lock acquire/release
    private static volatile int count = 0;

    public static void main(String[] args) throws InterruptedException {
        Lock lock = new ReentrantLock();
        Condition cond = lock.newCondition();

        int loop = 100;
        for (int i = 0; i < loop; i++) {
            // thread that can await on the condition
            new Thread(() -> {
                lock.lock();
                try {
                    count++;
                    cond.await();
                } catch (InterruptedException ignore) {
                } finally {
                    lock.unlock();
                    count--;
                }
            }, "succ-" + i).start();

            // thread that fails to await on the condition
            new Thread(() -> {
                boolean illegalMonitor = false;
                try {
                    cond.awaitUninterruptibly();
                } catch (IllegalMonitorStateException ignore) {
                    illegalMonitor = true;
                } finally {
                    assert illegalMonitor;
                }
            }, "fail-" + i).start();

        }

        // make sure that all succ threads got the lock before we try to signal them.
        while (count != loop) {
            Thread.yield();
        }

        // signal all threads and retry if count > 0
        while (count > 0) {
            lock.lock();
            try {
                cond.signalAll();
            } finally {
                lock.unlock();
            }
        }
    }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Regardless of the fact that AbstractQueuedSynchronizer.ConditionObject#await can handle the case that the current thread does not hold the exclusive lock and throw IllegalMonitorStateException to respond, it may become unreliable and dangerous. 

So make sure the lock is acquired before invoking ConditionObject#await, otherwise some waiting threads may always wait.

SUPPORT :
YES


Comments
In the end, we settled on the "obvious" /** * Adds a new waiter to wait queue. * @return its new wait node */ private Node addConditionWaiter() { + if (!isHeldExclusively()) + throw new IllegalMonitorStateException(); Node t = lastWaiter; // If lastWaiter is cancelled, clean out.
2017-10-03

I added a precheck in addConditionWaiter. (The check passes if owner is null, because AQS objects are not required to record ownership.) Also done in AbstractQueuedLongSynchronizer. *** AbstractQueuedSynchronizer.java.~1.149.~ 2017-09-29 07:54:20.423660399 -0400 --- AbstractQueuedSynchronizer.java 2017-09-29 08:25:42.771541099 -0400 *************** *** 1838,1843 **** --- 1838,1848 ---- * @return its new wait node */ private Node addConditionWaiter() { + Thread owner; + if (!isHeldExclusively() || + ((owner = getExclusiveOwnerThread()) != null && + owner != Thread.currentThread())) + throw new IllegalMonitorStateException(); Node t = lastWaiter; // If lastWaiter is cancelled, clean out. if (t != null && t.waitStatus != Node.CONDITION) {
2017-09-29

Seems this bug was introduced very early: http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/main/java/util/concurrent/locks/AbstractQueuedSynchronizer.java?r1=1.38&r2=1.39 In 1.38 we called checkConditionAccess on entry to each condition method to ensure lock ownership. In 1.39 isHeldExclusively was used, but only for the signalling methods, not the waiting methods. The waiting methods internalized the ownership check via fullyRelease, but meanwhile addConditionWaiter added the new Node to the wait queue optimistically and with no use of atomics or locking. Hence we can lose waiters concurrently being added to the wait queue.
2017-09-13

I played with this a bit. Below is my variant, that reproduces hang reliably. My theory (agreeing with David) is that wait queue can only be manipulated safely while holding the lock and some elements in the queue are dropped when wait is called without the lock, although IMSE is thrown reliably. ``` import java.util.concurrent.locks.Condition; import java.util.concurrent.locks.ReentrantLock; import java.util.concurrent.CountDownLatch; public class ConditionObjectTest { public static void main(String[] args) throws Exception { final int loop = 1000; final ReentrantLock lock = new ReentrantLock(); final Condition cond = lock.newCondition(); final CountDownLatch done = new CountDownLatch(loop); for (int i = 0; i < loop; i++) { new Thread(() -> { lock.lock(); try { done.countDown(); cond.await(); } catch (InterruptedException unexpected) { throw new AssertionError("interrupted?"); } finally { lock.unlock(); } }, "good-" + i).start(); new Thread(() -> { try { cond.awaitUninterruptibly(); // not holding lock? throw new AssertionError("should throw"); } catch (IllegalMonitorStateException expected) {} }, "evil-" + i).start(); } done.await(); for (int prev = -1;; Thread.yield()) { lock.lock(); try { int qLength = lock.getWaitQueueLength(cond); if (qLength != prev) { System.err.println(qLength); prev = qLength; } if (qLength == loop) { cond.signalAll(); break; } } finally { lock.unlock(); } } } } ```
2017-09-13

It is unclear how enabling assertions is impacting things but there is a major bug in AQS.ConditionObject.await logic for when IllegalMonitorStateException will be thrown. We have: public final void awaitUninterruptibly() { Node node = addConditionWaiter(); int savedState = fullyRelease(node); <= IllegalMonitorStateException thrown from here But if the Lock is not in fact held then we are updating the wait-queue in an unsafe manner, as a thread holding the lock may be adding itself at the same time!
2017-09-11

Reopening There is a problem here, but nothing to do with AQS I think. With assertions enabled the test hangs reliably. But if I add: if (args != null) as a guard on the creation of the thread that throws IllegalMonitorStateException, then it doesn't hang! But if I add a println in the IMSE catch clause, it hangs again! Very odd. The test is broken in that it is incorrectly synchronized with respect to the count variable, yet that should not be affected by whether or not assertions are enabled.
2017-09-11

Closing as incomplete while we wait for the submitter to correctly synchronize the test and then see if there is actually a problem.
2017-09-11

The test is broken - count must only be updated and read while the lock is held.
2017-09-11

To reproduce the issue, run the attached test case. JDK 8u144 - Fail JDK 9-ea+181 - Fail The program hangs when run with java -ea option.
2017-09-11