JDK-4831461 : JNI GetEnv() race with ThreadLocalStorage::set_thread() guarantee
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 1.3.1_13,1.4.1,1.4.2
  • Priority: P4
  • Status: Closed
  • Resolution: Fixed
  • OS: solaris_8
  • CPU: sparc
  • Submitted: 2003-03-12
  • Updated: 2012-10-08
  • Resolved: 2003-06-05
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
1.4.2_02 02Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Description
###@###.### 2003-03-12

Stress testing of the AsyncGetCallTrace() interface has exposed an
intermittent guarantee failure:

# guarantee(get_thread() == thread, "must be the same thread, quickly")
# Error ID : src/share/vm/runtime/threadLocalStorage.cp, 66

The AsyncGetCallTrace() specific portion of the failure is tracked by
the following bug id:

4822846 4/4 AsyncGetCallTrace: guarantee(get_thread() == thread, "must be the
            same thread...

This same guarantee failure can be triggered by calling JNI GetEnv()
and that failure mode is tracked by this bug.

When the WatcherThread exits, the ThreadLocalStorage reference for
exiting WatcherThread is set to NULL. However, a JNI GetEnv() call
causes the cached thread info to be refreshed and we fail the guarantee.
I don't have a java_g cmd stack trace for this crash.

The failure reproduces with a less informative message with the java cmd:

# Error ID: 5448524541442C4F43414C33544F524147450E4350500042

The ErrorID maps to: threadLocalStorage.cpp, 66

Here is a snippet of the stack trace for the java cmd crash:

  [6] report_fatal(0xfefdbc65, 0x42, 0xfefdbcb4, 0xfeff4000, 0xfa17fef0, 0x0), a
t 0xfee34a7c
  [7] ThreadLocalStorage::set_thread(0x0, 0xffffffc4, 0xff013178, 0xffff8000, 0x
7f, 0x0), at 0xfec522a4
  [8] _start(0xe3010, 0xfebf0c00, 0x0, 0x0, 0x0, 0x0), at 0xfecc5e4c

The complete stack trace for the java crash is attached as threads.log.486.


###@###.### 2003-03-12 (update 1)

I've attached a complete stack trace for a java crash with JDK1.4.1
and the attached test program (PepTest run) as threads.log.18. Here
is part of the stack trace for the main thread which was exiting:

  [6] report_fatal(0xfefffd13, 0x3a, 0xfefffd5c, 0x80, 0x2bbc0, 0x0), at 0xfee5a
f64
  [7] ThreadLocalStorage::set_thread(0x0, 0x1, 0x285614, 0x5800, 0xff010000, 0x0
), at 0xfec52340
  [8] Thread::delete_thread_and_TLS_current(0x2c280, 0x0, 0x0, 0x0, 0xbb178, 0xf
fbee954), at 0xfed8aa74
  [9] jni_DetachCurrentThread(0xff023598, 0x2ccbc, 0xbb178, 0x2ccac, 0x2c30c, 0x
f2c450b8), at 0xfed89eb4


###@###.### 2003-04-04

Some additional variations on the race have shown up during Java2Demo
stress testing. complete stack traces are attached 

JNI GetEnv() calls src/os/solaris/vm/thread_solaris.inline.hpp:
ThreadLocalStorage::thread(). If the thread is not in the cache, it
calls ThreadLocalStorage::get_thread_via_cache_slowly() which calls
ThreadLocalStorage::get_thread_slow(). The thread value we get back
fails the guarantee() call.

Here is a snippet of the stack trace:

  [7] report_fatal(0xfefde057, 0x2a, 0xfefde0a6, 0xfe990000, 0x1bffb8, 0xfef73d0
8), at 0xfee35d4c
  [8] ThreadLocalStorage::get_thread_via_cache_slowly(0xfebf1800, 0x67, 0xfebf18
00, 0x1, 0x0, 0x0), at 0xfef73d34
  [9] jni_GetEnv(0xff009694, 0xf13ff960, 0x10002, 0x0, 0x0, 0xfcfb1988), at 0xfe
d7e4dc
  [10] profhandler(0x1d, 0xf13ffde0, 0xf13ffb28, 0x0, 0x0, 0x0), at 0xfcfa0c3c
  [11] __sighndlr(0x1d, 0xf13ffde0, 0xf13ffb28, 0xfcfa0b30, 0x0, 0x0), at 0xff37
4c8c
  [12] call_user_handler(0xfebf1800, 0x31, 0xff387b20, 0xf13ffb28, 0xf13ffde0, 0
x1d), at 0xff36fadc
  [13] sigacthandler(0xfebf1800, 0xf13ffde0, 0xf13ffb28, 0xff386000, 0xf13ffde0,
 0x1d), at 0xff36fca8
  ---- called from signal handler with signal -21030912 (SIG-21030912) ------
  [14] ThreadLocalStorage::set_thread(0x0, 0x1, 0x546ba8, 0x5359c0, 0x31fc68, 0x
fecd6294), at 0xfec52418
  [15] JavaThread::run(0xad26e0, 0xffffffc4, 0xff015218, 0xffff8000, 0x0, 0x0),
at 0xfecd6358

The complete stack trace is attached as threads.log.65. For the above
guarantee failure, the message usually fails to print. I have seen
SIGBUS, SIGILL and SIGSEGV during the print attempt.

In this example, JNI GetEnv() gets SIGILL. Here is a snippet of the
stack trace:

  ---- called from signal handler with signal -21029888 (SIG-21029888) ------
  [6] 0x443430(0x443438, 0x1, 0xfebf1c00, 0x1, 0x0, 0x0), at 0x44342f
  [7] jni_GetEnv(0xff009694, 0xf11ff960, 0x10002, 0x0, 0x0, 0xfcfb1988), at 0xfe
d7e4fc
  [8] profhandler(0x1d, 0xf11ffde0, 0xf11ffb28, 0x0, 0x0, 0x0), at 0xfcfa0c3c
  [9] __sighndlr(0x1d, 0xf11ffde0, 0xf11ffb28, 0xfcfa0b30, 0x0, 0x0), at 0xff374
c8c
  [10] call_user_handler(0xfebf1c00, 0x1a, 0xff387b20, 0xf11ffb28, 0xf11ffde0, 0
x1d), at 0xff36fadc
  [11] sigacthandler(0xfebf1c00, 0xf11ffde0, 0xf11ffb28, 0xff386000, 0xf11ffde0,
 0x1d), at 0xff36fca8
  ---- called from signal handler with signal -21029888 (SIG-21029888) ------
  [12] ThreadLocalStorage::set_thread(0x0, 0x1, 0x3f6730, 0xa6498, 0x31fc68, 0xf
ecd6294), at 0xfec5242c
  [13] JavaThread::run(0x443438, 0xffffffc4, 0xff015218, 0xffff8000, 0x0, 0x0),
at 0xfecd6358

The complete stack trace is attached as threads.log.66.

In this example, JNI GetEnv() gets SIGSEGV. Here is a snippet of the
stack trace:

  ---- called from signal handler with signal -21028864 (SIG-21028864) ------
  [6] 0x0(0x731618, 0x1, 0xfebf2000, 0x1, 0x0, 0x0), at 0xffffffffffffffff
  [7] jni_GetEnv(0xff009694, 0xf0d7f960, 0x10002, 0x0, 0x0, 0xfcfb1988), at 0xfe
d7e4fc
  [8] profhandler(0x1d, 0xf0d7fde0, 0xf0d7fb28, 0x0, 0x0, 0x0), at 0xfcfa0c3c
  [9] __sighndlr(0x1d, 0xf0d7fde0, 0xf0d7fb28, 0xfcfa0b30, 0x0, 0x0), at 0xff374
c8c
  [10] call_user_handler(0xfebf2000, 0x25, 0xff387b20, 0xf0d7fb28, 0xf0d7fde0, 0
x1d), at 0xff36fadc
  [11] sigacthandler(0xfebf2000, 0xf0d7fde0, 0xf0d7fb28, 0xff386000, 0xf0d7fde0,
 0x1d), at 0xff36fca8
  ---- called from signal handler with signal -21028864 (SIG-21028864) ------
  [12] ThreadLocalStorage::set_thread(0x0, 0x1, 0x5bb908, 0x2dfbc8, 0x31fc68, 0x
fecd6294), at 0xfec52418
  [13] JavaThread::run(0x731618, 0xffffffc4, 0xff015218, 0xffff8000, 0x0, 0x0),
at 0xfecd6358

The complete stack trace is attached as threads.log.98.

In this example, we receive the SIGPROF signal while we are executing
the "delete thread" code. Here is a snippet of the stack trace:

  [5] sigacthandler(), at 0xdfb8e1e0
  ---- called from signal handler with signal 11 (SIGSEGV) ------
  [6] jni_GetEnv(0xdfa7f4dc, 0xd245fbb4, 0x10002), at 0xdf862752
  [7] profhandler(0x1d, 0xd245fc54, 0xd245fcd4), at 0xdd790b5c
  [8] __sighndlr(0x1d, 0xd245fc54, 0xd245fcd4, 0xdd790ab0), at 0xdfb92963
  [9] call_user_handler(0x1d, 0xd245fc54, 0xd245fcd4), at 0xdfb8e0b6
  [10] take_deferred_signal(0x1d), at 0xdfb8e181
  [11] exit_critical(), at 0xdfb89f14
  [12] preempt(), at 0xdfb8f391
  [13] _ti_mutex_unlock(0xdfb5a338), at 0xdfb900b6
  [14] free(0x84b87a0), at 0xdfaf6c80
  [15] os::free(0x84b87a0), at 0xdf7e33c8
  [16] CHeapObj::operator delete(0x84b87a0), at 0xdf80ea19
  [17] __SLIP.DELETER__D(0x84b87a0, 0x1), at 0xdf84e47d
  [18] Thread::delete_thread_and_TLS_current(0x84b87a0), at 0xdf84e42d
  [19] JavaThread::thread_main_inner(0x84b87a0), at 0xdf7f9d3a

The complete stack trace is attached as thread.log.137.

The crashes have shown up in VolanoTest running on 12*900MHZ machines.
Vtest server crashed after about 8 hours. complete stack trace is
attached as vtest.log.18602 .
stack trace :
>>>>>  [13] t_delete(0x1, 0x30, 0x28, 0x270390, 0x352ff8, 0x0), at
>>>>> 0xff2c388c
>>>>>  [14] realfree(0x352fd0, 0xff3426f8, 0xff3426f8, 0x3, 0x3, 0x0), at
>>>>> 0xff2c3520
>>>>>  [15] _free_unlocked(0x3394e8, 0x0, 0x0, 0xff33c000, 0x0, 0x0), at
>>>>> 0xff2c3d0c
>>>>>  [16] free(0x3394e8, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xff2c3c64
>>>>>  [17] ThreadSafepointState::destroy(0x39db70, 0x45, 0xff17a000,
>>>>> 0xfee75f58,
>>>>> 0x321c38, 0x1), at 0xfee773c0
>>>>>  [18] __SLIP.DELETER__D(0x39db70, 0x1, 0x1106e0, 0x34d820, 0x30f928,
>>>>> 0xfee6a5c8), at 0xfee77448
>>>>>  [19] JavaThread::run(0x39db70, 0x942c2, 0x40, 0x0, 0x40, 0x0), at
>>>>> 0xfee6a684
>>>>>  [20] _start(0x39db70, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfee66b04


###@###.### 2003-05-09

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: 1.4.2_02 tiger FIXED IN: 1.4.2_02 tiger INTEGRATED IN: 1.4.2_02 tiger tiger-b08
14-06-2004

PUBLIC COMMENTS .
10-06-2004

EVALUATION ###@###.### 2003-03-12 The guarantee() calls in ThreadLocalStorage::set_thread() are too strict. When a JavaThread is exiting set_thread() is called with NULL. A "get thread" operation refreshes the cache so during thread exit it is possible for the cached thread to be set to NULL, refreshed by another thread via a JNI GetEnv() call, and then the guarantee() is checked. ###@###.### 2003-04-04 The fix for the following bug: 4335645 4/5 PrintHeapAllocation, PrintMalloc segv on thread exit needs to be revised in order to solve all versions of the race.
04-04-2003

SUGGESTED FIX ###@###.### 2003-03-12 Here is the context diff for the proposed fix: ------- src/share/vm/runtime/threadLocalStorage.cpp ------- *** /tmp/sccs.s1aWWX Wed Mar 12 16:44:09 2003 --- threadLocalStorage.cpp Wed Mar 12 11:06:21 2003 *************** *** 62,70 **** } os::thread_local_storage_at_put(ThreadLocalStorage::thread_index(), thread); // The following ensure that any optimization tricks we have tried ! // did not backfire on us: ! guarantee(get_thread() == thread, "must be the same thread, quickly"); ! guarantee(get_thread_slow() == thread, "must be the same thread, slowly"); } void ThreadLocalStorage::init() { --- 62,75 ---- } os::thread_local_storage_at_put(ThreadLocalStorage::thread_index(), thread); // The following ensure that any optimization tricks we have tried ! // did not backfire on us. If thread == NULL, then the thread is ! // trying to exit. Since a "get" operation refreshes the cache, ! // during thread exit we can have a race between the "get" and ! // the "set to NULL". ! if (thread != NULL) { ! guarantee(get_thread() == thread, "must be the same thread, quickly"); ! guarantee(get_thread_slow() == thread, "must be the same thread, slowly"); ! } } void ThreadLocalStorage::init() { ###@###.### 2003-04-04 Fixing the latest examples of the race will allow these guarantee() calls to be restored so the above code diff will shortly be obsolete.
04-04-2003