JDK-4461173 : Linux:intermittent hang due to mutex being granted to suspended thread
Type:Bug
Component:hotspot
Sub-Component:runtime
Affected Version:1.4.0
Priority:P1
Status:Closed
Resolution:Fixed
OS:linux,solaris_8
CPU:generic,x86,sparc
Submitted:2001-05-21
Updated:2012-10-08
Resolved:2001-11-03
The Version table provides details related to the release that this issue/RFE will be addressed.
Unresolved : Release in which this issue/RFE will be addressed. Resolved: Release in which this issue/RFE has been resolved. Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.
Merlin JDK build hang intermittently on Linux.
See attachment for stacktrace.
Comments
CONVERTED DATA
BugTraq+ Release Management Values
COMMIT TO FIX:
merlin-rc1
FIXED IN:
merlin-rc1
INTEGRATED IN:
merlin-rc1
14-06-2004
EVALUATION
From the stack trace, the VM is in shutdown phase. vm thread (thread 3) got
stuck waiting for ThreadCritical and never finished the VM_Exit operation.
However, there doesn't seem to be a thread that is inside ThreadCritical. The
compiler thread (thread 9) is waiting for ThreadCritical too, but it has
been suspended, apparently by VM_Exit operation.
LinuxThreads knows nothing about the suspension/resumption mechanism used
in VM. If there are more than one thread waiting for the same mutex, when
the mutex is unlocked, LinuxThreads grant the mutex to the longest waiting
thread. It's possible LinuxThreads grant the mutex to a already suspended
(by VM) thread.
In this case, it looks like the ThreadCritical mutex was granted to the
compiler thread which has already been suspended. Then we got a hang.
hui.huang@Eng 2001-05-21
With the help of a modified VM and a testcase that can hang quickly, I have
investigated further on this hang.
The VM_Exit operation will grab ThreadCritical lock before it actually
suspends a thread and releases the lock after the thread is suspended. This
is done for each thread that needs to be suspended during VM shutdown.
What happened in this hang is compiler thread tries to grab ThreadCritical
right after VM thread has entered ThreadCritical, then VM thread sends out
the suspension signal to suspend compiler thread. After this is done,
VM thread leaves ThreadCritical, LinuxThreads then grants the mutex to
the then-already-suspended compiler thread. When VM thread needs to suspend
another thread, it needs to enter ThreadCritical again. But because compiler
thread is now sleeping infinitely inside ThreadCritical, VM thread is put on
hold infinitely. Then there is a hang.
The fundamental problem for this bug is LinuxThreads knows nothing about
Java suspension/resumption and may grant a mutex to a suspended thread.
This problem was one of the major issues around jdb not working on Linux
(4369489) and is discussed in 4413752.
The simplest workaround for this problem is to make VM thread hold
ThreadCritical for the entire period of suspending all other threads,
that is, do not leave ThreadCritical after each thread has been suspended.
hui.huang@Eng 2001-05-21
---------------------------------------------
Change synopsis to reflect the nature of this hang. It's not only
in shutdown suspension, it can happen on safepoint suspension and
profiler suspension as well. See duped bugs for other testcases.
###@###.### 2001-10-17
---------------------------------------------
The main issue is a thread gets "signal suspended" inside SR_handler by
HotSpot and LinuxThreads then grants a mutex to the suspended thread. It
is fixed by not holding the thread inside SR_handler if the thread is
waiting for a mutex.
###@###.### 2001-11-02