JDK-8033696 : "assert(thread != NULL) failed: just checking" due to Thread::current() and JNI pthread interaction
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 7u45
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: x86_64
  • Submitted: 2014-02-05
  • Updated: 2022-01-24
  • Resolved: 2014-04-03
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7 JDK 8 JDK 9
7u72Fixed 8u20Fixed 9 b10Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Description
JNI code is using pthread_key_create with a destructor to detach the thread from the JVM when the thread is exiting.

This solution works well when running on Solaris or a 32 bit JVM on Linux, but when run on 64 bit JVM on Linux the threads hang when detaching.

For 64 bit Linux the JVM is also using the pthread_key_create, to store the 

Thread::current() value in a thread local storage.

Since the thread local storage containing the thread pointer is erased (by pthread) before the JNI destructor runs, we run detachCurrentThread on a thread that has NULL as current thread.

With a product build this breaks locks/monitors, and the threads hang. With a debug build an assert in Thread::current() is hit instead.

Everything works if detachCurrentThread is called from the main logic instead.


Comments
Re-examining this situation I cannot see how the detach call could ever hang or trigger an assertion even if the TLS was cleared, as that would just make the thread already appear to not be attached and so the call trivially succeeds. A second observation is that since we moved to use compiler-based thread-locals (JDK-8132510) the fix put in place here was potentially broken again. With compiler-based thread-locals that is how we check if the current thread is already attached, so that we can detach it. If the compiler thread-local implementation were to clear the variable, and that happens before the pthread_key_create destructor is run then we will get NULL for Thread::current() and so DetachCurrentThread would trivially return JNI_OK - as the current thread does not appear to be attached. Now we don't have a test for this so we don't know if things are broken or not. No one has complained since the compiler-based thread-local change in 9, so perhaps we are lucky and the compiler-based thread-local is either not cleared, or only cleared after the library-based thread-local. Or we simply never notice that a thread didn't really detach. Update: after further research it seems that gcc at least doesn't proactively do anything to thread-locals other than at some point reclaiming their memory (which should be after the thread really has finished executing). So the compiler-based thread-local should retain its value for the "current thread" until explicitly cleared.
24-01-2022

I'm not even sure now that this TLS destructor trick even works as originally intended ... it seems to be caused by an implementation detail of pthreads on linux/bsd (where it explicitly deallocates/clears TLS variables during thread termination - which Solaris does not do) and it seems to rely on the key for "current thread" being processed first - which is true for our launcher, but not necessarily true in general. Update: Thinking more it is okay. If the detaching destructor runs first it will clear the thread-current tls value. So we won't see that value set later in the loop and so our destructor won't run.
16-05-2016

SQE is OK to take this to PSU14_04
09-07-2014

Fix Summary Template - Fix for Release : 7 PSU - Risk Analysis : Very small fix - Testing (done/to-be-done) : Manual reproducer attached to bug
26-06-2014

URL: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/469835cd5494 User: lana Date: 2014-04-23 16:11:22 +0000
23-04-2014

URL: http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/469835cd5494 User: kevinw Date: 2014-04-03 10:38:28 +0000
03-04-2014

Added noreg-hard since the test relies on native code.
10-02-2014

I looked a bit more at how restoring the thread pointer with a destructor affected execution. There are three different scenarios: 1) detachCurrentThread is called from the main logic of native code: detachCurrentThread calls Thread::~Thread, which sets the TLS thread pointer to NULL. Pthread never calls the destructor (it only calls for non-NULL values). This is also what is done when a Java thread is spawned by the VM itself. This works out the same way as if we didn't have a thread pointer destructor. 2) detachCurrentThread is called from a pthread TLS destructor: The thread pointer destructor runs first and restores the thread pointer value. detachCurrentThread is called from the native code destructor. Thread::~Thread is called from detachCurrentThread and sets the TLS thread pointer to NULL. The thread pointer destructor is not called again, since its value is NULL now. This is neat since no new state needs to be tracked. 3) detachCurrentThread is never called: The thread pointer destructor will loop PTHREAD_DESTRUCTOR_ITERATIONS times (or possibly forever), unless some extra state is tracked. The Thread object will be leaked, as before. In theory (looking at the POSIX spec) we could leak additional TLS space, but at least the Linux implementation still cleans up the TLS. My opinion is that it isn't worth adding more code to avoid leaking TLS space or looping the thread pointer destructor, since the only case where that can happen is if detachCurrentThread is never called (which is already bad).
10-02-2014

Attaching script to compile and run the example on Mac (needs to be ran from the outside the callback folder, and also needs http://repo1.maven.org/maven2/junit/junit/4.11/junit-4.11.jar)
05-02-2014

I get the same issue on Mac: Constructing callback Successfully attached native thread 0x12646a000 Successfully registered for detach Java callback: native thread: 4937129984, java thread: Thread-0, 2 active threads # To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=/thread.hpp:682 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/Users/gerard/Desktop/work/jdks/jdk9/hotspot/src/share/vm/runtime/thread.hpp:682), pid=62859, tid=23299 # assert(thread != NULL) failed: just checking # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-debug-gerard_2014_01_30_10_14-b00) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b62-debug mixed mode bsd-amd64 compressed oops) # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /Volumes/work/bugs/8033696/hs_err_pid62859.log # # If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp # Current thread is 23299 Dumping core ...
05-02-2014

Is the NetBeans crash on Ubuntu64bit (JDK-8025568) related?
05-02-2014

I found a way to change the JVM to workaround this problem: By creating a destructor for the thread pointer TLS we can restore the value after pthread has set it to NULL. Then when the native code destructor is run the thread pointer is still intact. Restoring a value in a pthread TLS is explicitly supported according to the man page for pthread_key_create, and it will call the destructor for the restored value again. One would have to keep some extra state to make sure the destructor is only called twice, since a pthread implementation is allowed to call the destructor infinite times as long as the value is restored. On my system pthread calls the destructor a maximum of four times, so the attached JVM patch was sufficient as a proof of concept.
05-02-2014

Attached reproducer. Compile native: # 64bit JAVA_HOME=/java/linux-x64/jdk1.7.0_45 gcc -shared -fpic -o libNative.so -I$JAVA_HOME/include -I$JAVA_HOME/include/linux -lstdc++ Callback_Native.cpp # 32bit JAVA_HOME=/java/linux-i586/jdk1.7.0_45 gcc -v -m32 -shared -fpic -o libNative.so -I$JAVA_HOME/include -I$JAVA_HOME/include/linux -lstdc++ Callback_Native.cpp Compile java (from callback/src/main/java): JAVA_HOME=/java/linux-x64/jdk-1.7.0_45 $JAVA_HOME/bin/javac com/test/callback/CallbackTest.java $JAVA_HOME/bin/javac com/test/callback/App.java To run: (from callback/src/main/java) NATIVE=../../../native $JAVA_HOME/bin/java -Djava.library.path=$NATIVE com.test.callback.App
05-02-2014