Case 1: Deadlock on resume by debugger
======================================
The JDWP agent deadlocks the vm if
* A thread T is blocked in blockOnDebuggerSuspend because it called
j.l.Thread.resume() on a thread "resumee" that is currently suspended by the
debugger
* The debugger tries to resume one or all threads
because T owns handlerLock waiting for a resume by the debugger and the debugger
needs handlerLock for the resume.
Stacks on Deadlock
------------------
### Stack of Thread T
#0 futex_wait_cancelable
#1 __pthread_cond_wait_common
#2 __pthread_cond_wait
#3 os::PlatformEvent::park
#4 JvmtiRawMonitor::simple_wait
#5 JvmtiRawMonitor::raw_wait
#6 JvmtiEnv::RawMonitorWait
#7 debugMonitorWait
#8 blockOnDebuggerSuspend
#9 handleAppResumeBreakpoint
#10 event_callback
#11 cbBreakpoint
#12 JvmtiExport::post_raw_breakpoint
#13 InterpreterRuntime::_breakpoint
### JDWP Agent Stack
#0 futex_wait_cancelable
#1 __pthread_cond_wait_common
#2 __pthread_cond_wait
#3 os::PlatformEvent::park
#4 JvmtiRawMonitor::simple_enter
#5 JvmtiRawMonitor::raw_enter
#6 JvmtiEnv::RawMonitorEnter
#7 debugMonitorEnter
#8 eventHandler_lock
#9 threadControl_resumeThread
#10 resume
#11 debugLoop_run
#12 connectionInitiated
#13 attachThread
#14 JvmtiAgentThread::call_start_function
#15 JavaThread::thread_main_inner
#16 Thread::call_run
#17 thread_native_entry
#18 start_thread
#19 clone
See attachment for jtreg reproducer.
Case 2: Deadlock on JDWP Dispose command
========================================
We see sporadic timouts running
test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose003 because
the debuggee main thread and the JDWP agent thread deadlock with the following
stacks:
### Debuggee Main Thread "M"
#0 futex_wait_cancelable
#1 __pthread_cond_wait_common
#2 __pthread_cond_wait
#3 os::PlatformEvent::park
#4 JvmtiRawMonitor::simple_wait
#5 JvmtiRawMonitor::raw_wait
#6 JvmtiEnv::RawMonitorWait
#7 debugMonitorWait
#8 blockOnDebuggerSuspend
#9 handleAppResumeBreakpoint
#10 event_callback
#11 cbBreakpoint
#12 JvmtiExport::post_raw_breakpoint
#13 InterpreterRuntime::_breakpoint
### JDWP Agent Thread "A"
#0 futex_wait_cancelable
#1 __pthread_cond_wait_common
#2 __pthread_cond_wait
#3 os::PlatformEvent::park
#4 JvmtiRawMonitor::simple_enter
#5 JvmtiRawMonitor::raw_enter
#6 JvmtiEnv::RawMonitorEnter
#7 debugMonitorEnter
#8 eventHandler_free
#9 threadControl_onDisconnect
#10 debugLoop_run
#11 connectionInitiated
#12 attachThread
#13 JvmtiAgentThread::call_start_function
#14 JavaThread::thread_main_inner
#15 Thread::call_run
#16 thread_native_entry
#17 start_thread
#18 clone
#### How to reproduce
The deadlock will likely be reached with the following patch. Apply and run dispose003.
--- a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c
+++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c
@@ -180,6 +180,9 @@ debugLoop_run(void)
shouldListen = !lastCommand(cmd);
}
}
+ /* Sleep to trigger deadlock in test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose003 */
+ fprintf(stderr, "debugLoop: sleep\n");
+ sleep(1);
threadControl_onDisconnect();
standardHandlers_onDisconnect();
#### Analysis
M hit the internal breakpoint in j.l.Thread.resume()[1]. The resumee
"testedThread" (named "thread2" in log output[2]) is currently suspended
therefore M waits on threadLock until resumee is not suspended anymore while
owning handlerLock (acquired in event_callback)[3].
A should call threadControl_reset to resume all threads including "testedThread" so
that M can continue but it is blocked before that in eventHandler_free trying to
enter handlerLock owned by M.
Note that the vm.dispose() call by the debugger immediately returns. Resuming
all suspended threads is done asynchronously[4].
[1] M calls j.l.Thread.resume() and hits the internal breakpoint set by the JDWP agent
https://github.com/openjdk/jdk/blob/32811026ce5ecb1d27d835eac33de9ccbd51fcbf/test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose003a.java#L139
[2] "testedThread" is named "thread2" in log output.
https://github.com/openjdk/jdk/blob/32811026ce5ecb1d27d835eac33de9ccbd51fcbf/test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose003a.java#L137
[3] M calls `blockOnDebuggerSuspend()` when hitting the internal
breakpoint in j.l.Thread.resume(). There it waits while the resumee is
suspended by the debugger.
https://github.com/openjdk/jdk/blob/32811026ce5ecb1d27d835eac33de9ccbd51fcbf/src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c#L749
[4] vm.dispose() call by debugger returns immediately. Threads are resumed asynchronously.
https://github.com/openjdk/jdk/blob/32811026ce5ecb1d27d835eac33de9ccbd51fcbf/test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose003.java#L228