This bug started off innocently enough. There were no test failures or other observed issues, but I noticed the following comment and assert in the debug agent.
void
threadControl_setPendingInterrupt(jthread thread)
{
ThreadNode *node;
/*
* vmTestbase/nsk/jdb/kill/kill001/kill001.java seems to be the only code
* that triggers this function. Is uses ThreadReference.Stop.
*
* Since ThreadReference.Stop is not supported for vthreads, we should never
* get here with a vthread.
*/
JDI_ASSERT(!isVThread(thread));
...
The comment and assert are clearly bit rotted, since we now support ThreadReference.Stop for virtual threads, and it is tested and works (kill001 passes). So the question is why is this assert not hit. I made it to always assert, and it does assert if the target thread is a platform thread. The call chain is:
threadControl_setPendingInterrupt
handleInterrupt
debugMonitorWait
enqueueCommand
eventHelper_reportEvents
reportEvents
filterAndHandleEvent
event_callback
cbException
JvmtiExport::post_exception_throw
InterpreterRuntime::exception_handler_for_exception
This is all part of the debug agent's support for handling an interrupt when it is doing a JVMTI RawMonitorWait(). If it returns JVMTI_ERROR_INTERRUPT, it calls threadControl_setPendingInterrupt() to set the ThreadNode->pendingInterrupt flag and then re-enters the RawMonitorWait(). Then when the debug agent is about to exit the JVMTI event handler (see threadControl_onEventHandlerExit()), it checks the pendingInterrupt flag, and if set it calls JVMTI InterruptThread() and clears the flag.
This all seems to be working fine for platform threads (I confirmed this all happens), but for virtual threads we never get into threadControl_setPendingInterrupt(). Thus the interrupt is ignored (and no test catches this).
The issues is in handleInterrupt(), which you see in the above stack trace, and threadControl_currentThread(). In handleInterrupt() we have:
jthread thread = threadControl_currentThread();
if ((thread != NULL) && (!threadControl_isDebugThread(thread))) {
threadControl_setPendingInterrupt(thread);
}
The problem here is that for virtual threads threadControl_currentThread() is returning NULL, so we never call threadControl_setPendingInterrupt(). Here's the problematic code in threadControl_currentThread():
node = findThread(&runningThreads, NULL);
thread = (node == NULL) ? NULL : node->thread;
It's only searching the runningThreads list. Virtual threads are in the runningVThreads list. We could search both lists, but since we don't necessarily track all vthreads in runningVThreads, we still might not find the virtual thread. I can see no reason why we don't just instead call JVMTI CurrentThread(), so that is what the code was fixed to do. Also no need for the threadLock in this cases since the thread lists are not being looked at.
With the threadControl_currentThread() fix in place, I then started hitting deadlocks. One thread was grabbing the handlerLock and then blocked trying to grab the threadLock (that is the proper locking order), while a 2nd thread had grabbed the threadLock and was blocking trying to grab the handlerLock (that's the wrong locking order). The problem was doPendingTasks(), which calls JVMTI InterruptThread() if there is a pending interrupt (as discussed earlier). It turns out for virtual threads InterruptThread() does a java upcall to Thread.interrupt(). The 2nd thread has just finished handling an event, grabbed the threadLock, and then called InterruptThread() (via doPendingTasks()). The 2nd thread also had single stepping enabled. This resulted in a singlestep event while executing Thread.interrupt(). The debug agent event handler tried to grab the handlerLock and deadlocked as a result because it already had the threadLock (which should have been grabbed 2nd) and the other thread already has the handlerLock (and wants to grab the threadLock)
As a first fix I simply had the doPendingTasks() related code always grab the handlerLock before the threadLock, even if the handlerLock wasn't really neeed. This worked but I didn't like the idea that any locks were being held during the InterruptThread() upcall to Thread.interrupt(), so I reworked the code to not require that the locks be held.
One more problem is with the ThreadReference.Interrupt support. It also uses JVMTI InterruptThread(), and results in an java upcall to Thread.interrupt(). See threadControl_interrupt(). Although no test detected any issues with this, it also is liable to result in a deadlock since it grabs the threadLock and not the handlerLock. I fixed this simply by removing all locking. The locking was so we could set ThreadNode->pendingInterrupt if the thread was currently in an event handler (so we can defer the InterruptThread() call). There is no reason to do this since as described above the event handler is already setup to handle the interrupt. The only possible impact of the interrupt on the event handler is a premature return from RawMonitorWait(), which is dealt with as described above.