United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6791815 Fix for 6471657 can cause deadlock on non-Solaris platforms when initializing direct buffer support
JDK-6791815 : Fix for 6471657 can cause deadlock on non-Solaris platforms when initializing direct buffer support

Details
Type:
Bug
Submit Date:
2009-01-09
Status:
Closed
Updated Date:
2012-10-08
Project Name:
JDK
Resolved Date:
2009-02-11
Component:
hotspot
OS:
generic
Sub-Component:
runtime
CPU:
generic
Priority:
P3
Resolution:
Fixed
Affected Versions:
hs10,6u14
Fixed Versions:
hs15 (b01)

Related Reports
Backport:
Backport:
Duplicate:

Sub Tasks

Description
The fix for 6471657 introduced a ThreadInVMFromNative transition prior to calling os::yield_all from a thread that needed to wait for another thread to complete initialization of the direct buffer support. This was needed because on Solaris yield_all calls os::sleep and os::sleep (due to a then recent change) requires that the calling thread be threadInVM.

However on linux yield_all() simply calls sched_yield() and there is no need to be threadInVM. Further the state change leads to a deadlock scenario. The thread that is busy-waiting for initialization to complete is marked as being threadInVM, so if the thread doing the initialization triggers a safepoint, the VMThread will wait for this busy-waiting thread to enter the safepoint. But that will never happen and so we get a deadlock.

Similarly on Windows, yield_all() is a win32 Sleep call, so the same deadlock could occur.

There doesn't seem to be an easy solution for positioning the state transition as it is only needed when yield_all is called from this code on Solaris, but this code is platform independent. You can't unconditionally place the tranition inside yield_all because all the other locations that call it already have the thread state set correctly - and a conditional insertion of a thread-state transition requires that the internal logic be duplicated because the transition objects operator at block scope.

Another solution is to replace the current cmpxch + busy-wait logic with a traditional mutex/condvar solution using wait/signal. A lock-free fast-path will also be needed for performance.

                                    

Comments
WORK AROUND

The problem arises when there is a race between two or more threads to initialize the direct buffer support. The workaround is to ensure that a direct buffer is created earlier by a single thread, hence avoiding the initialization race.
                                     
2009-01-09
EVALUATION

See description.
                                     
2009-01-09
PUBLIC COMMENTS

There is an easy solution: moving the transition inside the loop:

  } else {
-    ThreadInVMfromNative tivn(thread); // set state as yield_all can call os:sleep
    while (!directBufferSupportInitializeEnded && !directBufferSupportInitializeFailed) {
+     ThreadInVMfromNative tivn(thread); // set state as yield_all can call os:sleep
      os::yield_all();
    }
  }

Thanks to Tom R.
                                     
2009-01-09
EVALUATION

http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/4db4e58c16bd
                                     
2009-01-14
PUBLIC COMMENTS

As per 6852404 the deadlock scenario exists on Solaris as well. If a safepoint is initiated while a thread is doing the initialization, the threads waiting for initiialization to complete would never reach a safepoint.
                                     
2009-06-19



Hardware and Software, Engineered to Work Together