JDK-8157409 : Thread dump doesn't report deadlock when one is present.
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 7,8,9
  • Priority: P3
  • Status: Closed
  • Resolution: Won't Fix
  • OS: generic
  • CPU: generic
  • Submitted: 2016-05-09
  • Updated: 2017-03-10
  • Resolved: 2016-05-23
Related Reports
Relates :  
Description
FULL PRODUCT VERSION :
$ java -version
java version "1.8.0_40"
Java(TM) SE Runtime Environment (build 1.8.0_40-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)
And other versions

FULL OS VERSION :
Linux: 2.6.32-358.23.2.el6.x86_64, Mac OS: 10.9.5. This problem is unlikely to be OS-specific.

EXTRA RELEVANT SYSTEM CONFIGURATION :
Observed on multiple versions of Java 8.

A DESCRIPTION OF THE PROBLEM :
After a close examination of the OpenJDK implementation of ReentrantReadWriteLock, I have a succeeded in creating a scenario similar to (our) customer's: a deadlock that the JVM deadlock detector cannot detect.

The trick is that a ReentrantReadWriteLock$NonfairSync (the default, and the type of ReentrantReadWriteLock that the customer's class loader is using) contains a heuristic intended to ensure that writers get some priority over readers, so that even in a busy reading environment a lone writer will eventually get a chance to proceed. The heuristic works like this: when a new thread arrives, the code gives the head of the queue an unsynchronized "glance". If the thread at the head of the queue is a writer, then the arriving thread is queued behind it, even if the the lock is only held by a reader and the arriving thread is also a reader.
As a result, it is possible for readers to queue behind readers when a writer intervenes (and I have a repro case to prove it).

Given this, the following set of actions result in a deadlock which is not reported by the JVM in a thread dump:

T1 takes read lock, then gets slow or is descheduled.
T2 queues for write lock
T3 takes object monitor, queues for readlock. But it queues behind T2 as explained above.
T1 gets busy again, queues for monitor

Now T1 is holding the read lock trying to get the monitor, and T3 is holding the monitor trying to get the read lock. It would be perfectly legal for T3 to take the read lock in parallel with T1, since they are both readers. And this would allow the monitor to be released, solving the whole problem. But the implementation doesn't allow it as explained above. So all three threads are deadlocked. If the JVM is now sent a kill -3, it does not report this as a deadlock, presumably because T2 is not holding any resources and T3 looks like it could run (even though really, it can't.)

Here's the output from the repro:
Thread T1: start, taking read lock...
Thread T2: start, sleeping for 1 second...
Thread T1: have read lock, sleeping for 3 seconds...
Thread T3: start: waiting 2 seconds for T1 to get the read lock and T2 to queue for the write lock...
Thread T2: queuing for write lock...
Thread T3: awake after 2 seconds, taking the monitor...
Thread T3: have monitor, queuing for read lock...
Thread T1: continuing after 3 seconds, queuing for the monitor...

(thread dump here - exceeds your length limit)

THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: Did not try

THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Did not try

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
See source code below.

EXPECTED VERSUS ACTUAL BEHAVIOR :
The JVM should report a deadlock. It doesn't.
ERROR MESSAGES/STACK TRACES THAT OCCUR :
Not required.

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
    /*

    T1 takes read lock, gets slow
    T2 queues for write lock
    T3 takes monitor, queues for readlock, stuck behind T2 (SEE NOTE)
    T1 queues for monitor

    Deadlock - not detected by JVM in thread dump in response to kill -3.

    NOTE: the issue is that the rwlock only held for read, so in theory T3 can proceed.
    But in practice, the implementation of the ReentrantReadWriteLock$NonfairSync cheats
    by "glancing" at the head of the queue on every arrival. If the thread at the head
    of the queue is trying to get the write lock, the new arrival is queued, even if its
    a reader that could be allowed to proceed, like T3. As a result, readers can queue
    behind readers, leading to the possibility of deadlocks that don't look like deadlocks
    to the JVM deadlock detector.

     */

    static void sleepMillis(int millis) {
        java.util.concurrent.locks.LockSupport.parkNanos(1000L * 1000L * millis);
    }

    static void pr(String msg) {
        System.err.println("Thread " + Thread.currentThread().getName() + ": " + msg);
    }

    static class T1 extends Thread {
        @Override
        public void run() {
            Thread.currentThread().setName("T1");
            pr("start, taking read lock...");
            rwLock.readLock().lock();
            pr("have read lock, sleeping for 3 seconds...");
            sleepMillis(3000);
            pr("continuing after 3 seconds, queuing for the monitor...");
            synchronized (monitorLock) {
                pr("uh oh: got the monitor (not expected).");
            }
        }
    }

    static class T2 extends Thread {
        @Override
        public void run() {
            Thread.currentThread().setName("T2");
            pr("start, sleeping for 1 second...");
            sleepMillis(1000);
            pr("queuing for write lock...");
            rwLock.writeLock().lock();
            pr("uh oh: got the write lock (not expected).");
        }
    }

    static class T3 extends Thread {
        @Override
        public void run() {
            Thread.currentThread().setName("T3");
            pr("start: waiting 2 seconds for T1 to get the read lock and T2 to queue for the write lock...");
            sleepMillis(2000);
            pr("awake after 2 seconds, taking the monitor...");
            synchronized(monitorLock) {
                pr("have monitor, queuing for read lock...");
                rwLock.readLock().lock();
            }
            pr("uh oh: got the read lock and released the monitor (not expected).");
        }
    }

    @Test
    public void testDeadlockWithBothReadersAndWriters() throws Exception {
        Thread first = new T1();
        Thread second = new T2();
        Thread third = new T3();
        first.start();
        second.start();
        third.start();
        first.join();
        second.join();
        third.join();
    }

---------- END SOURCE ----------


Comments
Of course this type of enhancement can be taken up by the OpenJDK community if they choose. This might be an interesting student project.
24-05-2016

Based on David's comment, we're closing this as WNF.
23-05-2016

The deadlock detection in the VM is provided to assist with common deadlock scenarios and is not claimed to be capable of detecting every possible form of deadlock that may exist. In that regard this is not a bug but a request for an enhancement. I think this would be a particularly difficult situation to detect.
20-05-2016

Issue is reporucible on 8u92 == -sh-4.1$ /opt/java/jdk1.8.0_92/bin/javac TestDeadlock.java -sh-4.1$ /opt/java/jdk1.8.0_92/bin/java TestDeadlock Thread T3: start: waiting 2 seconds for T1 to get the read lock and T2 to queue for the write lock... Thread T2: start, sleeping for 1 second... Thread T1: start, taking read lock... Thread T1: have read lock, sleeping for 3 seconds... Thread T2: queuing for write lock... Thread T3: awake after 2 seconds, taking the monitor... Thread T3: have monitor, queuing for read lock... Thread T1: continuing after 3 seconds, queuing for the monitor... 2016-05-20 06:00:21 Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.92-b14 mixed mode): "T3" #11 prio=5 os_prio=0 tid=0x00007fb1300eb800 nid=0x4c8c waiting on condition [0x00007fb11a366000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000770c621f8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at TestDeadlock$T3.run(TestDeadlock.java:48) - locked <0x0000000770c62668> (a java.util.concurrent.locks.ReentrantLock) "T2" #10 prio=5 os_prio=0 tid=0x00007fb1300ea000 nid=0x4c8b waiting on condition [0x00007fb11a467000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000770c621f8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) at TestDeadlock$T2.run(TestDeadlock.java:35) "T1" #9 prio=5 os_prio=0 tid=0x00007fb1300e8000 nid=0x4c8a waiting for monitor entry [0x00007fb11a568000] java.lang.Thread.State: BLOCKED (on object monitor) at TestDeadlock$T1.run(TestDeadlock.java:24) - waiting to lock <0x0000000770c62668> (a java.util.concurrent.locks.ReentrantLock) "Service Thread" #8 daemon prio=9 os_prio=0 tid=0x00007fb1300c8000 nid=0x4c88 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE "C1 CompilerThread2" #7 daemon prio=9 os_prio=0 tid=0x00007fb1300ba800 nid=0x4c87 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "C2 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007fb1300b9000 nid=0x4c86 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007fb1300b6000 nid=0x4c85 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007fb1300b4800 nid=0x4c84 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007fb130082000 nid=0x4c83 in Object.wait() [0x00007fb11ac6f000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x0000000770c08ee0> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143) - locked <0x0000000770c08ee0> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209) "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007fb13007d000 nid=0x4c82 in Object.wait() [0x00007fb11ad70000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x0000000770c06b50> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:502) at java.lang.ref.Reference.tryHandlePending(Reference.java:191) - locked <0x0000000770c06b50> (a java.lang.ref.Reference$Lock) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153) "main" #1 prio=5 os_prio=0 tid=0x00007fb130007800 nid=0x4c7c in Object.wait() [0x00007fb135d21000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x0000000770c627f8> (a TestDeadlock$T1) at java.lang.Thread.join(Thread.java:1245) - locked <0x0000000770c627f8> (a TestDeadlock$T1) at java.lang.Thread.join(Thread.java:1319) at TestDeadlock.testDeadlockWithBothReadersAndWriters(TestDeadlock.java:62) at TestDeadlock.main(TestDeadlock.java:71) "VM Thread" os_prio=0 tid=0x00007fb130075800 nid=0x4c81 runnable "GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007fb13001c800 nid=0x4c7d runnable "GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007fb13001e800 nid=0x4c7e runnable "GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007fb130020800 nid=0x4c7f runnable "GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007fb130022000 nid=0x4c80 runnable "VM Periodic Task Thread" os_prio=0 tid=0x00007fb1300cb000 nid=0x4c89 waiting on condition JNI global references: 6 Heap PSYoungGen total 71168K, used 6144K [0x0000000770c00000, 0x0000000775b00000, 0x00000007c0000000) eden space 61440K, 10% used [0x0000000770c00000,0x0000000771200060,0x0000000774800000) from space 9728K, 0% used [0x0000000775180000,0x0000000775180000,0x0000000775b00000) to space 9728K, 0% used [0x0000000774800000,0x0000000774800000,0x0000000775180000) ParOldGen total 162816K, used 0K [0x00000006d2400000, 0x00000006dc300000, 0x0000000770c00000) object space 162816K, 0% used [0x00000006d2400000,0x00000006d2400000,0x00000006dc300000) Metaspace used 2558K, capacity 4496K, committed 4864K, reserved 1056768K class space used 280K, capacity 388K, committed 512K, reserved 1048576K ==
20-05-2016