JDK-6271298 : java.util.concurrent unpark() may reference stale native thread structures
Type:Bug
Component:hotspot
Sub-Component:runtime
Affected Version:5.0
Priority:P2
Status:Closed
Resolution:Fixed
OS:generic
CPU:generic
Submitted:2005-05-16
Updated:2019-04-29
Resolved:2005-09-08
The Version table provides details related to the release that this issue/RFE will be addressed.
Unresolved : Release in which this issue/RFE will be addressed. Resolved: Release in which this issue/RFE has been resolved. Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.
Remarks:
* See the hs-sq-for-bug-report-3.tar attachment for a test case, hs_err file, etc
* now that we understand the problem it's likely we could construct an intentional test case that failred more reliably/frequently
* The bug is intermittent and results from an inopportune race. Briefly, it appears that a unpark operation against a moribund thread can result in stale references to memory that previously contained a native thread structure. Note that the lifecycle of a java thread object and the corresponding native thread structure are different.
* Thanks to Doug Lea for reporting the failure and providing the test-case.
* We this is a day-one bug in the native JSR166 java.util.concurrent park-unpark support routines
Possible solutions (excerpt from email dialog - Dave replying to Doug):
I suppose the options available to us are:
1. use the double-resolve idiom under the threads_lock -- see the thread.interrupt() implementation for an example. This is works, is easy to code, and should be safe, but I'm not in love with putting that much stress on the thread lookup code. (I think it uses a simple-minded linear lookup). Also, unpark()ing the target thread can sometimes result in immediate preemption of the unparker. Since the unpark holds the threads_lock this could easily induce convoying. On solaris I could transiently set the please-dont-preempt-me flag until we got the lock dropped, but that doesn't help linux or windows.
2. Only destroy the native thread structure when the thread object finalizer runs. If java-level references exist then the native thread structure exists. As an extreme variation, we could even force the native thread (LWP) to persist, too. Amusing but not viable. Delayed finizalization could cause the JVM to collect many moribund threads, and there's no existing "back pressure" to force us to GC & finalize if we start running short on virtual address space (held by moribund threads) .
3. Add reference counts to the native thread structure. The result would be similar to how handles are managed in unix & windows. A ref(throbj) function would grab the threads_lock, perform the lookup, and if successful, grab a thread-specific, lock ++ the reference count, and then drop both locks and return the pointer to the native thread structure. Unref(t) function would grab the thread-specific lock, decrement, and if the count reached 0, grab the threads_lock, remove "t" from the lists, and perform final destruction, etc. This is clunky and lock-laden.
4. Something like RCU where we defer destruction. (I haven't though this out, and it's likely that JVM-specific constraints would make this unworkable).
5. Somewhat relatedly, defer destructing native thread structures until the next safepoint. Moribund threads go onto a special senescent list, which is processed at stop-the-world time. It's safe to lookup a thread and hold a reference to it as long as the reference holding thread doesn't go safe. It'd probably be safe for the native thread to exit in a timely fashion, but we'd keep the thread's native structure around until stop-the-world time.
6. Make the JSR166 park-unpark primitives use the new objectMonitor "Event" mechanism (which also exposes park-unpark). Add a jlong field to the Thread class to keep a native pointer to the Event.
7. Same as 6, but elevate the Event abstraction to a 1st-class java type. (I still think we'd need to hide a jlong native pointer in the next Event type. Otherwise how do we get from the java Event to the native Event? Thoughts?)
I'd recommend we go with (1) -- for both 1.6 and 1.5uX -- and later modify the code to use (6) or (7). If we think we want to take this approach then we should probably contact Martin and Pete immediately and have the jlong "NativeParkEventPointer" instance field added to the thread class ASAP to avoid any later "flag day" drama.
-Dave
dave.dice@sun.com 2005-05-16 23:23:28 GMT
I've added a smaller test-case, CR6271298.java, which reproduces the problem in short order on MP systems.
We should go with "nativeParkEventPointer" to be consistent with the other Thread field names.
pete.soper@sun.com 2005-05-17 13:10:57 GMT
dave.dice@sun.com 2005-05-17 22:56:59 GMT
======================================================
removed 'no-jdc' as it is not a valid keyword.
'nojdc' is the only valid keyword, to block the bug from being public, which this bug has.
16-04-2019
JUSTIFICATION
Priority changed from [] to [2-High]
segv in native code
dave.dice@sun.com 2005-05-16 23:23:27 GMT
16-04-2019
SUGGESTED FIX
See comments
###@###.### 2005-05-16 23:23:28 GMT
16-05-2005
EVALUATION
See comments
###@###.### 2005-05-16 23:23:28 GMT