United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-4831461 JNI GetEnv() race with ThreadLocalStorage::set_thread() guarantee
JDK-4831461 : JNI GetEnv() race with ThreadLocalStorage::set_thread() guarantee

Details
Type:
Bug
Submit Date:
2003-03-12
Status:
Closed
Updated Date:
2012-10-08
Project Name:
JDK
Resolved Date:
2003-06-05
Component:
hotspot
OS:
solaris_8
Sub-Component:
runtime
CPU:
sparc
Priority:
P4
Resolution:
Fixed
Affected Versions:
1.3.1_13,1.4.1,1.4.2
Fixed Versions:
1.4.2_02 (02)

Related Reports
Backport:
Duplicate:
Relates:
Relates:

Sub Tasks

Description
###@###.### 2003-03-12

Stress testing of the AsyncGetCallTrace() interface has exposed an
intermittent guarantee failure:

# guarantee(get_thread() == thread, "must be the same thread, quickly")
# Error ID : src/share/vm/runtime/threadLocalStorage.cp, 66

The AsyncGetCallTrace() specific portion of the failure is tracked by
the following bug id:

4822846 4/4 AsyncGetCallTrace: guarantee(get_thread() == thread, "must be the
            same thread...

This same guarantee failure can be triggered by calling JNI GetEnv()
and that failure mode is tracked by this bug.

When the WatcherThread exits, the ThreadLocalStorage reference for
exiting WatcherThread is set to NULL. However, a JNI GetEnv() call
causes the cached thread info to be refreshed and we fail the guarantee.
I don't have a java_g cmd stack trace for this crash.

The failure reproduces with a less informative message with the java cmd:

# Error ID: 5448524541442C4F43414C33544F524147450E4350500042

The ErrorID maps to: threadLocalStorage.cpp, 66

Here is a snippet of the stack trace for the java cmd crash:

  [6] report_fatal(0xfefdbc65, 0x42, 0xfefdbcb4, 0xfeff4000, 0xfa17fef0, 0x0), a
t 0xfee34a7c
  [7] ThreadLocalStorage::set_thread(0x0, 0xffffffc4, 0xff013178, 0xffff8000, 0x
7f, 0x0), at 0xfec522a4
  [8] _start(0xe3010, 0xfebf0c00, 0x0, 0x0, 0x0, 0x0), at 0xfecc5e4c

The complete stack trace for the java crash is attached as threads.log.486.


###@###.### 2003-03-12 (update 1)

I've attached a complete stack trace for a java crash with JDK1.4.1
and the attached test program (PepTest run) as threads.log.18. Here
is part of the stack trace for the main thread which was exiting:

  [6] report_fatal(0xfefffd13, 0x3a, 0xfefffd5c, 0x80, 0x2bbc0, 0x0), at 0xfee5a
f64
  [7] ThreadLocalStorage::set_thread(0x0, 0x1, 0x285614, 0x5800, 0xff010000, 0x0
), at 0xfec52340
  [8] Thread::delete_thread_and_TLS_current(0x2c280, 0x0, 0x0, 0x0, 0xbb178, 0xf
fbee954), at 0xfed8aa74
  [9] jni_DetachCurrentThread(0xff023598, 0x2ccbc, 0xbb178, 0x2ccac, 0x2c30c, 0x
f2c450b8), at 0xfed89eb4


###@###.### 2003-04-04

Some additional variations on the race have shown up during Java2Demo
stress testing. complete stack traces are attached 

JNI GetEnv() calls src/os/solaris/vm/thread_solaris.inline.hpp:
ThreadLocalStorage::thread(). If the thread is not in the cache, it
calls ThreadLocalStorage::get_thread_via_cache_slowly() which calls
ThreadLocalStorage::get_thread_slow(). The thread value we get back
fails the guarantee() call.

Here is a snippet of the stack trace:

  [7] report_fatal(0xfefde057, 0x2a, 0xfefde0a6, 0xfe990000, 0x1bffb8, 0xfef73d0
8), at 0xfee35d4c
  [8] ThreadLocalStorage::get_thread_via_cache_slowly(0xfebf1800, 0x67, 0xfebf18
00, 0x1, 0x0, 0x0), at 0xfef73d34
  [9] jni_GetEnv(0xff009694, 0xf13ff960, 0x10002, 0x0, 0x0, 0xfcfb1988), at 0xfe
d7e4dc
  [10] profhandler(0x1d, 0xf13ffde0, 0xf13ffb28, 0x0, 0x0, 0x0), at 0xfcfa0c3c
  [11] __sighndlr(0x1d, 0xf13ffde0, 0xf13ffb28, 0xfcfa0b30, 0x0, 0x0), at 0xff37
4c8c
  [12] call_user_handler(0xfebf1800, 0x31, 0xff387b20, 0xf13ffb28, 0xf13ffde0, 0
x1d), at 0xff36fadc
  [13] sigacthandler(0xfebf1800, 0xf13ffde0, 0xf13ffb28, 0xff386000, 0xf13ffde0,
 0x1d), at 0xff36fca8
  ---- called from signal handler with signal -21030912 (SIG-21030912) ------
  [14] ThreadLocalStorage::set_thread(0x0, 0x1, 0x546ba8, 0x5359c0, 0x31fc68, 0x
fecd6294), at 0xfec52418
  [15] JavaThread::run(0xad26e0, 0xffffffc4, 0xff015218, 0xffff8000, 0x0, 0x0),
at 0xfecd6358

The complete stack trace is attached as threads.log.65. For the above
guarantee failure, the message usually fails to print. I have seen
SIGBUS, SIGILL and SIGSEGV during the print attempt.

In this example, JNI GetEnv() gets SIGILL. Here is a snippet of the
stack trace:

  ---- called from signal handler with signal -21029888 (SIG-21029888) ------
  [6] 0x443430(0x443438, 0x1, 0xfebf1c00, 0x1, 0x0, 0x0), at 0x44342f
  [7] jni_GetEnv(0xff009694, 0xf11ff960, 0x10002, 0x0, 0x0, 0xfcfb1988), at 0xfe
d7e4fc
  [8] profhandler(0x1d, 0xf11ffde0, 0xf11ffb28, 0x0, 0x0, 0x0), at 0xfcfa0c3c
  [9] __sighndlr(0x1d, 0xf11ffde0, 0xf11ffb28, 0xfcfa0b30, 0x0, 0x0), at 0xff374
c8c
  [10] call_user_handler(0xfebf1c00, 0x1a, 0xff387b20, 0xf11ffb28, 0xf11ffde0, 0
x1d), at 0xff36fadc
  [11] sigacthandler(0xfebf1c00, 0xf11ffde0, 0xf11ffb28, 0xff386000, 0xf11ffde0,
 0x1d), at 0xff36fca8
  ---- called from signal handler with signal -21029888 (SIG-21029888) ------
  [12] ThreadLocalStorage::set_thread(0x0, 0x1, 0x3f6730, 0xa6498, 0x31fc68, 0xf
ecd6294), at 0xfec5242c
  [13] JavaThread::run(0x443438, 0xffffffc4, 0xff015218, 0xffff8000, 0x0, 0x0),
at 0xfecd6358

The complete stack trace is attached as threads.log.66.

In this example, JNI GetEnv() gets SIGSEGV. Here is a snippet of the
stack trace:

  ---- called from signal handler with signal -21028864 (SIG-21028864) ------
  [6] 0x0(0x731618, 0x1, 0xfebf2000, 0x1, 0x0, 0x0), at 0xffffffffffffffff
  [7] jni_GetEnv(0xff009694, 0xf0d7f960, 0x10002, 0x0, 0x0, 0xfcfb1988), at 0xfe
d7e4fc
  [8] profhandler(0x1d, 0xf0d7fde0, 0xf0d7fb28, 0x0, 0x0, 0x0), at 0xfcfa0c3c
  [9] __sighndlr(0x1d, 0xf0d7fde0, 0xf0d7fb28, 0xfcfa0b30, 0x0, 0x0), at 0xff374
c8c
  [10] call_user_handler(0xfebf2000, 0x25, 0xff387b20, 0xf0d7fb28, 0xf0d7fde0, 0
x1d), at 0xff36fadc
  [11] sigacthandler(0xfebf2000, 0xf0d7fde0, 0xf0d7fb28, 0xff386000, 0xf0d7fde0,
 0x1d), at 0xff36fca8
  ---- called from signal handler with signal -21028864 (SIG-21028864) ------
  [12] ThreadLocalStorage::set_thread(0x0, 0x1, 0x5bb908, 0x2dfbc8, 0x31fc68, 0x
fecd6294), at 0xfec52418
  [13] JavaThread::run(0x731618, 0xffffffc4, 0xff015218, 0xffff8000, 0x0, 0x0),
at 0xfecd6358

The complete stack trace is attached as threads.log.98.

In this example, we receive the SIGPROF signal while we are executing
the "delete thread" code. Here is a snippet of the stack trace:

  [5] sigacthandler(), at 0xdfb8e1e0
  ---- called from signal handler with signal 11 (SIGSEGV) ------
  [6] jni_GetEnv(0xdfa7f4dc, 0xd245fbb4, 0x10002), at 0xdf862752
  [7] profhandler(0x1d, 0xd245fc54, 0xd245fcd4), at 0xdd790b5c
  [8] __sighndlr(0x1d, 0xd245fc54, 0xd245fcd4, 0xdd790ab0), at 0xdfb92963
  [9] call_user_handler(0x1d, 0xd245fc54, 0xd245fcd4), at 0xdfb8e0b6
  [10] take_deferred_signal(0x1d), at 0xdfb8e181
  [11] exit_critical(), at 0xdfb89f14
  [12] preempt(), at 0xdfb8f391
  [13] _ti_mutex_unlock(0xdfb5a338), at 0xdfb900b6
  [14] free(0x84b87a0), at 0xdfaf6c80
  [15] os::free(0x84b87a0), at 0xdf7e33c8
  [16] CHeapObj::operator delete(0x84b87a0), at 0xdf80ea19
  [17] __SLIP.DELETER__D(0x84b87a0, 0x1), at 0xdf84e47d
  [18] Thread::delete_thread_and_TLS_current(0x84b87a0), at 0xdf84e42d
  [19] JavaThread::thread_main_inner(0x84b87a0), at 0xdf7f9d3a

The complete stack trace is attached as thread.log.137.

The crashes have shown up in VolanoTest running on 12*900MHZ machines.
Vtest server crashed after about 8 hours. complete stack trace is
attached as vtest.log.18602 .
stack trace :
>>>>>  [13] t_delete(0x1, 0x30, 0x28, 0x270390, 0x352ff8, 0x0), at
>>>>> 0xff2c388c
>>>>>  [14] realfree(0x352fd0, 0xff3426f8, 0xff3426f8, 0x3, 0x3, 0x0), at
>>>>> 0xff2c3520
>>>>>  [15] _free_unlocked(0x3394e8, 0x0, 0x0, 0xff33c000, 0x0, 0x0), at
>>>>> 0xff2c3d0c
>>>>>  [16] free(0x3394e8, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xff2c3c64
>>>>>  [17] ThreadSafepointState::destroy(0x39db70, 0x45, 0xff17a000,
>>>>> 0xfee75f58,
>>>>> 0x321c38, 0x1), at 0xfee773c0
>>>>>  [18] __SLIP.DELETER__D(0x39db70, 0x1, 0x1106e0, 0x34d820, 0x30f928,
>>>>> 0xfee6a5c8), at 0xfee77448
>>>>>  [19] JavaThread::run(0x39db70, 0x942c2, 0x40, 0x0, 0x40, 0x0), at
>>>>> 0xfee6a684
>>>>>  [20] _start(0x39db70, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfee66b04


###@###.### 2003-05-09

                                    

Comments
SUGGESTED FIX

###@###.### 2003-03-12

Here is the context diff for the proposed fix:

------- src/share/vm/runtime/threadLocalStorage.cpp -------
*** /tmp/sccs.s1aWWX    Wed Mar 12 16:44:09 2003
--- threadLocalStorage.cpp      Wed Mar 12 11:06:21 2003
***************
*** 62,70 ****
    }
    os::thread_local_storage_at_put(ThreadLocalStorage::thread_index(), thread);
    // The following ensure that any optimization tricks we have tried
!   // did not backfire on us:
!   guarantee(get_thread()      == thread, "must be the same thread, quickly");
!   guarantee(get_thread_slow() == thread, "must be the same thread, slowly");
  }
  
  void ThreadLocalStorage::init() {
--- 62,75 ----
    }
    os::thread_local_storage_at_put(ThreadLocalStorage::thread_index(), thread);
    // The following ensure that any optimization tricks we have tried
!   // did not backfire on us. If thread == NULL, then the thread is
!   // trying to exit. Since a "get" operation refreshes the cache,
!   // during thread exit we can have a race between the "get" and
!   // the "set to NULL".
!   if (thread != NULL) {
!     guarantee(get_thread()      == thread, "must be the same thread, quickly");
!     guarantee(get_thread_slow() == thread, "must be the same thread, slowly");
!   }
  }
  
  void ThreadLocalStorage::init() {


###@###.### 2003-04-04

Fixing the latest examples of the race will allow these guarantee()
calls to be restored so the above code diff will shortly be obsolete.
                                     
2003-04-04
EVALUATION

###@###.### 2003-03-12

The guarantee() calls in ThreadLocalStorage::set_thread() are too
strict. When a JavaThread is exiting set_thread() is called with NULL.
A "get thread" operation refreshes the cache so during thread exit
it is possible for the cached thread to be set to NULL, refreshed
by another thread via a JNI GetEnv() call, and then the guarantee()
is checked.


###@###.### 2003-04-04

The fix for the following bug:

4335645 4/5 PrintHeapAllocation, PrintMalloc segv on thread exit

needs to be revised in order to solve all versions of the race.
                                     
2003-04-04
PUBLIC COMMENTS

.
                                     
2004-06-10
CONVERTED DATA

BugTraq+ Release Management Values

COMMIT TO FIX:
1.4.2_02
tiger

FIXED IN:
1.4.2_02
tiger

INTEGRATED IN:
1.4.2_02
tiger
tiger-b08


                                     
2004-06-14



Hardware and Software, Engineered to Work Together