JDK-6852404 : Race condition in JNI Direct Buffer access and creation routines
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 6u14
  • Priority: P2
  • Status: Closed
  • Resolution: Duplicate
  • OS: generic
  • CPU: generic
  • Submitted: 2009-06-18
  • Updated: 2011-12-19
  • Resolved: 2009-06-19
Related Reports
Duplicate :  
Relates :  
Relates :  
Description
While testing multithreaded OpenGL programs in Java via JOGL, ###@###.### discovered a race condition in the implementation of the JNI direct buffer access and creation routines (NewDirectByteBuffer, GetDirectBufferAddress, GetDirectBufferCapacity). When these routines were written (by the bug submitter, over 8 years ago), not enough care was taken in the case where multiple threads enter the common routine initializeDirectBufferSupport() in jni.cpp. The following stack traces from gdb show a deadlock that irrevocably hangs the JVM:

Thread 17:
#15 0x00007ffff7033e9c in lookupOne(JNIEnv_*, char const*, Thread*) () from /opt-linux-x86_64/jre1.6.0_14/lib/amd64/server/libjvm.so
#16 0x00007ffff7032647 in lookupDirectBufferClasses () from /opt-linux-x86_64/jre1.6.0_14/lib/amd64/server/libjvm.so
#17 0x00007ffff7034058 in initializeDirectBufferSupport(JNIEnv_*, JavaThread*) () from /opt-linux-x86_64/jre1.6.0_14/lib/amd64/server/libjvm.so
#18 0x00007ffff70327d2 in jni_NewDirectByteBuffer () from /opt-linux-x86_64/jre1.6.0_14/lib/amd64/server/libjvm.so
#19 0x00007fffa8c2b493 in Java_com_sun_opengl_impl_x11_glx_GLX_glXChooseFBConfigCopied1__JILjava_lang_Object_2ILjava_lang_Object_2I (env=0x402959b8, _unused=0x7fffab7b1408, dpy=140735944911616, screen=0, attribList=0x7fffab7b1430, attribList_byte_offset=0, nitems=0x7fffab7b1420, nitems_byte_offset=0) at /net/jordan/usr/local/projects/SUN/JOGL/jogl/build-x86_64/jogl/gensrc/native/jogl/X11/GLX_JNI.c:125
#20 0x00007ffff308ff50 in ?? ()
#21 0x00007fffab7b1420 in ?? ()
#22 0x0000000000000000 in ?? ()

Thread 16:
#0  0x00007ffff7bced59 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007ffff71ea727 in os::PlatformEvent::park() () from /opt-linux-x86_64/jre1.6.0_14/lib/amd64/server/libjvm.so
#2  0x00007ffff71c3429 in Monitor::ILock(Thread*) () from /opt-linux-x86_64/jre1.6.0_14/lib/amd64/server/libjvm.so
#3  0x00007ffff71c3b60 in Monitor::lock_without_safepoint_check() () from /opt-linux-x86_64/jre1.6.0_14/lib/amd64/server/libjvm.so
#4  0x00007ffff72547aa in SafepointSynchronize::block(JavaThread*) () from /opt-linux-x86_64/jre1.6.0_14/lib/amd64/server/libjvm.so
#5  0x00007ffff72d145b in JavaThread::check_safepoint_and_suspend_for_native_trans(JavaThread*) () from /opt-linux-x86_64/jre1.6.0_14/lib/amd64/server/libjvm.so
#6  0x00007ffff7031800 in jni_ReleasePrimitiveArrayCritical () from /opt-linux-x86_64/jre1.6.0_14/lib/amd64/server/libjvm.so
#7  0x00007fffa8c2b42d in Java_com_sun_opengl_impl_x11_glx_GLX_glXChooseFBConfigCopied1__JILjava_lang_Object_2ILjava_lang_Object_2I (env=0x4026b1b8, _unused=0x7fffab8b2488, dpy=140735945158560, screen=0, attribList=0x7fffab8b24b0, attribList_byte_offset=0, 

Thread 15:
#0  0x00007ffff7722da7 in sched_yield () from /lib64/libc.so.6
#1  0x00007ffff71e8b09 in os::yield_all(int) () from /opt-linux-x86_64/jre1.6.0_14/lib/amd64/server/libjvm.so
#2  0x00007ffff7033f77 in initializeDirectBufferSupport(JNIEnv_*, JavaThread*) () from /opt-linux-x86_64/jre1.6.0_14/lib/amd64/server/libjvm.so
#3  0x00007ffff70327d2 in jni_NewDirectByteBuffer () from /opt-linux-x86_64/jre1.6.0_14/lib/amd64/server/libjvm.so
#4  0x00007fffa8c2b493 in Java_com_sun_opengl_impl_x11_glx_GLX_glXChooseFBConfigCopied1__JILjava_lang_Object_2ILjava_lang_Object_2I (env=0x7fffac1891b8, _unused=0x7fffab9d9508, dpy=1077294608, screen=0, attribList=0x7fffab9d9530, attribList_byte_offset=0, nitems=0x7fffab9d9520, nitems_byte_offset=0) at /net/jordan/usr/local/projects/SUN/JOGL/jogl/build-x86_64/jogl/gensrc/native/jogl/X11/GLX_JNI.c:125

Thread 2:
#0  0x00007ffff7722da7 in sched_yield () from /lib64/libc.so.6
#1  0x00007ffff71e8b09 in os::yield_all(int) () from /opt-linux-x86_64/jre1.6.0_14/lib/amd64/server/libjvm.so
#2  0x00007ffff7033f77 in initializeDirectBufferSupport(JNIEnv_*, JavaThread*) () from /opt-linux-x86_64/jre1.6.0_14/lib/amd64/server/libjvm.so
#3  0x00007ffff70327d2 in jni_NewDirectByteBuffer () from /opt-linux-x86_64/jre1.6.0_14/lib/amd64/server/libjvm.so
#4  0x00007fffa8c2b493 in Java_com_sun_opengl_impl_x11_glx_GLX_glXChooseFBConfigCopied1__JILjava_lang_Object_2ILjava_lang_Object_2I (env=0x401131b8, _unused=0x7ffff69fc568, dpy=1077290816, screen=0, attribList=0x7ffff69fc590, attribList_byte_offset=0, nitems=0x7ffff69fc580, nitems_byte_offset=0) at /net/jordan/usr/local/projects/SUN/JOGL/jogl/build-x86_64/jogl/gensrc/native/jogl/X11/GLX_JNI.c:125

Three of the four threads are racing to be the first to complete the direct buffer-related initialization in the JNI implementation, initializeDirectBufferSupport. One has succeeded in entering the initialization routine and is stopped while looking up one of the classes due to a safepoint being requested. Two other threads are waiting for this first thread to complete the initialization. The bug is that those threads use ThreadInVMFromNative and a call to os::yield_all() in order to wait for the first thread to complete the initialization. Because the first thread is stopped for a safepoint request, it will not make progress and exit the initialization routine, and because the other two threads are in _thread_in_vm state busy looping in yield_all(), the VM will never actually reach the requested safepoint and therefore deadlocks.

Unfortunately we do not have a self-contained test case for this; it happens infrequently upon application startup, roughly once out of every 10 times, when multiple threads are doing JNI simultaneously upon startup.

The problem has been observed with JDK 1.6.0_14 on Linux/x86_64:
java version "1.6.0_14"
Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)

Comments
EVALUATION Actually already discovered and fixed.
19-06-2009

PUBLIC COMMENTS Actually I'll have to share some of the blame here as it was my fix for 6471657 that added the ThreadInVMfromNative transition - as that was needed after a change to os::sleep which is called from yield_all. I vaguely recall discussing the initialization race but concluding that it was benign and out of scope for the fix at the time. We should "bite the bullet" and use the proper wait/notify mechanics.
18-06-2009