JDK-8173658 : JvmtiExport::post_class_unload() is broken for non-JavaThread initiators
  • Type: Bug
  • Component: hotspot
  • Sub-Component: jvmti
  • Affected Version: 8,9,10
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2017-01-30
  • Updated: 2021-02-03
  • Resolved: 2019-12-02
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 13 JDK 14
11.0.10-oracleFixed 13.0.6Fixed 14 b26Fixed
Related Reports
CSR :  
Duplicate :  
Relates :  
Relates :  
Description
This problem was originally spotted back on 2010.08.12 when
the following bug was filed:

    JDK-6976636 JVM/TI test ex03t001 fails assertion

In particular it was observed that post_class_unload() didn't
work right when the thread that initiated the class unload event
was not a JavaThread. Back when Ramki wrote an email
about the issue, it was a CMS thread:

> The error is that post_class_unload() tries to do things with the
> thread that initiated the GC that caused the unload, under the assumption
> that it's a JavaThread. Here it's the CMS thread that initiated the
> remark pause that caused the unload, and all hell breaks loose.
> I don't know if this has been an error in the system all along that was never noticed
> (in the noise of the "known" column), or a new problem. Have there
> been recent changes in the way post_class_unload() works? Dan?
> Is the manipulation of the calling thread's state necessary? (Can
> the test be refined to do nothing if it's not a Java thread (which it
> isn't here)?)

While testing a fix for a different bug, I ran a failure of this test
that looks like this:

#  Internal Error (/work/shared/bug_hunt/8167108/SMR_prototype/hotspot/src/share/vm/runtime/jniHandles.cpp:57), pid=2497, tid=57
#  assert(Universe::heap()->is_in_reserved(obj)) failed: sanity check

with a stack trace that looks like this:

Current thread (0x0000000000ae1000):  VMThread "VM Thread" [stack: 0x00007fffb36e9000,0x00007fffb37e9000] [id=57]

Stack: [0x00007fffb36e9000,0x00007fffb37e9000],  sp=0x00007fffb37e7950,  free space=1018k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x2387fda]  void VMError::report_and_die(int,const char*,const char*,__va_list_element*,Thread*,unsigned char*,void*,void*,const char*,int,unsigned long)+0x5da
V  [libjvm.so+0x238796b]  void VMError::report_and_die(Thread*,const char*,int,const char*,const char*,__va_list_element*)+0x7b
V  [libjvm.so+0x17fcd3d]  void report_vm_error(const char*,int,const char*,const char*,...)+0xed
V  [libjvm.so+0x1cbdd5d]  _jobject*JNIHandles::make_local(Thread*,oopDesc*)+0x9d
V  [libjvm.so+0x1e04dc8]  void JvmtiExport::post_class_unload(Klass*)+0x2f8
V  [libjvm.so+0x1b6494c]  void InstanceKlass::notify_unload_class(InstanceKlass*)+0x2c
V  [libjvm.so+0x16bf88d]  void ClassLoaderData::classes_do(void(*)(InstanceKlass*))+0x6d
V  [libjvm.so+0x16c03f5]  void ClassLoaderData::unload()+0x35
V  [libjvm.so+0x16c312e]  bool ClassLoaderDataGraph::do_unloading(BoolObjectClosure*,bool)+0x13e
V  [libjvm.so+0x2299ea1]  bool SystemDictionary::do_unloading(BoolObjectClosure*,bool)+0x21
V  [libjvm.so+0x1a19e86]  void G1ConcurrentMark::weakRefsWork(bool)+0x676
V  [libjvm.so+0x1a1875f]  void G1ConcurrentMark::checkpointRootsFinal(bool)+0x1ef
V  [libjvm.so+0x17df6e6]  void CMCheckpointRootsFinalClosure::do_void()+0x26
V  [libjvm.so+0x23ee386]  void VM_CGC_Operation::doit()+0xd6
V  [libjvm.so+0x23eab1e]  void VM_Operation::evaluate()+0x9e
V  [libjvm.so+0x23e781f]  void VMThread::evaluate_operation(VM_Operation*)+0xcf
V  [libjvm.so+0x23e8013]  void VMThread::loop()+0x543
V  [libjvm.so+0x23e74ab]  void VMThread::run()+0x15b
V  [libjvm.so+0x20aa64a]  thread_native_entry+0x27a
C  [libc.so.1+0x125221]  _thrp_setup+0xa5
C  [libc.so.1+0x1254c0]  _lwp_start+0x0


Here's the code in question:

src/share/vm//prims/jvmtiExport.cpp:

void JvmtiExport::post_class_unload(Klass* klass) {
  if (JvmtiEnv::get_phase() < JVMTI_PHASE_PRIMORDIAL) {
    return;
  }
  Thread *thread = Thread::current();
  HandleMark hm(thread);
  KlassHandle kh(thread, klass);

  EVT_TRIG_TRACE(EXT_EVENT_CLASS_UNLOAD, ("[?] Trg Class Unload triggered" ));
  if (JvmtiEventController::is_enabled((jvmtiEvent)EXT_EVENT_CLASS_UNLOAD)) {
    assert(thread->is_VM_thread(), "wrong thread");

    // get JavaThread for whom we are proxy
    JavaThread *real_thread =
        (JavaThread *)((VMThread *)thread)->vm_operation()->calling_thread();

    JvmtiEnvIterator it;
    for (JvmtiEnv* env = it.first(); env != NULL; env = it.next(env)) {
      if (env->phase() == JVMTI_PHASE_PRIMORDIAL) {
        continue;
      }
      if (env->is_enabled((jvmtiEvent)EXT_EVENT_CLASS_UNLOAD)) {
        EVT_TRACE(EXT_EVENT_CLASS_UNLOAD, ("[?] Evt Class Unload sent %s",
                  kh()==NULL? "NULL" : kh()->external_name() ));

        // do everything manually, since this is a proxy - needs special care
        JNIEnv* jni_env = real_thread->jni_environment();
        jthread jt = (jthread)JNIHandles::make_local(real_thread, real_thread->threadObj());
        jclass jk = (jclass)JNIHandles::make_local(real_thread, kh()->java_mirror());

        // Before we call the JVMTI agent, we have to set the state in the
        // thread for which we are proxying.
        JavaThreadState prev_state = real_thread->thread_state();
        assert(((Thread *)real_thread)->is_ConcurrentGC_thread() ||
               (real_thread->is_Java_thread() && prev_state == _thread_blocked),
               "should be ConcurrentGCThread or JavaThread at safepoint");
        real_thread->set_thread_state(_thread_in_native);

        jvmtiExtensionEvent callback = env->ext_callbacks()->ClassUnload;
        if (callback != NULL) {
          (*callback)(env->jvmti_external(), jni_env, jt, jk);
        }

        assert(real_thread->thread_state() == _thread_in_native,
               "JavaThread should be in native");
        real_thread->set_thread_state(prev_state);

        JNIHandles::destroy_local(jk);
        JNIHandles::destroy_local(jt);
      }
    }
  }
}


According to dbx this is where I am:

(dbx) frame 10
Current function is JvmtiExport::post_class_unload
 1302           jthread jt = (jthread)JNIHandles::make_local(real_thread, real_thread->threadObj());

(dbx) print real_thread
real_thread = 0x5df800

The "real_thread" variable is of type JavaThread, but in this
particular case, the thread that initiated the VM operation
is not a JavaThread:

THREAD t@50

t@50(l@50) stopped in ___lwp_cond_wait at 0x7fffffd3e82a
0x00007fffffd3e82a: ___lwp_cond_wait+0x000a:    jae      ___lwp_cond_wait+0x18  [ 0x7fffffd3e838, .+0xe ]
current thread: t@50
  [1] ___lwp_cond_wait(0x5e0b50, 0x5e0b38, 0x0, 0x0, 0x7fffffd0fed8, 0x5e0b00), at 0x7fffffd3e82a
  [2] _lwp_cond_wait(), at 0x7fffffd0feec
=>[3] os::Solaris::cond_wait(cv = 0x5e0b50, mx = 0x5e0b38), line 219 in "os_solaris.hpp"
  [4] os::PlatformEvent::park(this = 0x5e0b00), line 5192 in "os_solaris.cpp"
  [5] ParkCommon(ev = 0x5e0b00, timo = 0), line 411 in "mutex.cpp"
  [6] Monitor::IWait(this = 0x43d0c0, Self = 0x5df800, timo = 0), line 793 in "mutex.cpp"
  [7] Monitor::wait(this = 0x43d0c0, no_safepoint_check = true, timeout = 0, as_suspend_equivalent = false), line 1116 in "mutex.cpp"
  [8] VMThread::execute(op = 0x7fffdffbaa28), line 610 in "vmThread.cpp"
  [9] ConcurrentMarkThread::run_service(this = 0x5df800), line 177 in "concurrentMarkThread.cpp"
  [10] ConcurrentGCThread::run(this = 0x5df800), line 82 in "concurrentGCThread.cpp"
  [11] thread_native_entry(thread_addr = 0x5df800), line 762 in "os_solaris.cpp"
  [12] _thrp_setup(), at 0x7fffffd35221
  [13] _lwp_start(), at 0x7fffffd354c0
Current function is PosixSemaphore::wait
 1315       ret = sem_wait(&_semaphore);


So back to our line of code:

jthread jt = (jthread)JNIHandles::make_local(real_thread, real_thread->threadObj());

We casted a ConcurrentGCThread * into a JavaThread*
and called "threadObj()" on it. We took that random value
and passed it to JNIHandles::make_local() which happened
to detect that "oop" we passed wasn't really an oop.
Comments
Fix request (13u) Requesting backport to 13u for parity with 11u. The patch applies cleanly, CSR for the fix is approved for 13u: JDK-8260003. Tested with tier1 and jvmti tests.
21-01-2021

Hi Goetz, what about the CSR? The original change has one...
20-11-2020

Fix request (11u) I would like to downport this for parity with 11.0.10-oracle. I had to resolve and adapt the change. http://mail.openjdk.java.net/pipermail/jdk-updates-dev/2020-November/004087.html
06-11-2020

URL: https://hg.openjdk.java.net/jdk/jdk/rev/4774b50671ed User: coleenp Date: 2019-12-02 14:02:35 +0000
02-12-2019

The ClassUnload event design is fundamentally broken. Class unloading is not a synchronous action - no thread is the "initiator" of a class unload and it makes no sense to pass a thread as an argument to the event. The thread being passed is whatever random thread happened to initiate the safepoint VM operation that then led to class unloading at the safepoint! Further passing a jclass for the class that has been unloaded is just completely broken - it has been unloaded therefore you cannot have a jclass referencing it!
13-11-2019

Targeted to tbd. The list of bus to fix in 14 is too big. It was supposed to be reduced after walk through.
29-10-2019

Adding Serguei and Dmitry since we three were involved with JDK-6976636.
31-01-2017

Ouch! The assertion was relaxed under JDK-6976636 but the code was never adapted to deal with the fact a ConcurrentGCThread is not a JavaThread!
31-01-2017

I now understand why the change I'm testing made this bug reproducible: (dbx) print *real_thread *real_thread = { _next = 0x5e0e70 _on_thread_list = false _threadObj = 0xf1f1f1f100000023 _java_call_counter = -235864064 <snip> The current change I'm playing with add the "_on_thread_list" field which moved the _threadObj field down. So with the casting of a JavaThread* over a ConcurrentGCThread, in the old code, the _threadObj would have seen a zero (false) instead of the non-zero value that it now observes. src/share/vm/runtime/jniHandles.cpp: jobject JNIHandles::make_local(Thread* thread, oop obj) { if (obj == NULL) { return NULL; // ignore null handles } else { assert(Universe::heap()->is_in_reserved(obj), "sanity check"); return thread->active_handles()->allocate_handle(obj); } } with a NULL threadObj value, JNIHandles::make_local() would not have asserted and we would have run on to the next dangerous thing...
30-01-2017

Here is more analysis of the code in question: src/share/vm//prims/jvmtiExport.cpp: 1275__void JvmtiExport::post_class_unload(Klass* klass) { 1276____if (JvmtiEnv::get_phase() < JVMTI_PHASE_PRIMORDIAL) { 1277______return; 1278____} 1279____Thread *thread = Thread::current(); 1280____HandleMark hm(thread); 1281____KlassHandle kh(thread, klass); 1282 1283____EVT_TRIG_TRACE(EXT_EVENT_CLASS_UNLOAD, ("[?] Trg Class Unload triggered" )); 1284____if (JvmtiEventController::is_enabled((jvmtiEvent)EXT_EVENT_CLASS_UNLOAD)) { 1285______assert(thread->is_VM_thread(), "wrong thread"); 1286 1287______// get JavaThread for whom we are proxy 1288______JavaThread *real_thread = 1289__________(JavaThread *)((VMThread *)thread)->vm_operation()->calling_thread(); The thread that requested the VMOperation has been casted into a JavaThread. In my crash, it was a ConcurrentGCThread. 1290 1291______JvmtiEnvIterator it; 1292______for (JvmtiEnv* env = it.first(); env != NULL; env = it.next(env)) { 1293________if (env->phase() == JVMTI_PHASE_PRIMORDIAL) { 1294__________continue; 1295________} 1296________if (env->is_enabled((jvmtiEvent)EXT_EVENT_CLASS_UNLOAD)) { 1297__________EVT_TRACE(EXT_EVENT_CLASS_UNLOAD, ("[?] Evt Class Unload sent %s", 1298____________________kh()==NULL? "NULL" : kh()->external_name() )); 1299 1300__________// do everything manually, since this is a proxy - needs special care 1301__________JNIEnv* jni_env = real_thread->jni_environment(); A ConcurrentGCThread doesn't have a jni_environment() so the jni_env value is random. 1302__________jthread jt = (jthread)JNIHandles::make_local(real_thread, real_thread->threadObj()); The above in the line where I happened to crash. A ConcurrentGCThread doesn't a threadObj() so I passed a random value as an "oop" to the JNIHandles::make_local() call which happened to blow up. 1303__________jclass jk = (jclass)JNIHandles::make_local(real_thread, kh()->java_mirror()); 1304 1305__________// Before we call the JVMTI agent, we have to set the state in the 1306__________// thread for which we are proxying. 1307__________JavaThreadState prev_state = real_thread->thread_state(); A ConcurrentGCThread does not have a JavaThreadState so the above value is random. 1308__________assert(((Thread *)real_thread)->is_ConcurrentGC_thread() || 1309_________________(real_thread->is_Java_thread() && prev_state == _thread_blocked), 1310_________________"should be ConcurrentGCThread or JavaThread at safepoint"); The above is the assert() modified by the fix for JDK-6976636. Because of the ((Thread *)real_thread)->is_ConcurrentGC_thread() check, we no longer fail this assert, but we do plenty of other things above and below that are not good. 1311__________real_thread->set_thread_state(_thread_in_native); A ConcurrentGCThread does not have a JavaThreadState so what did we just over write? 1312 1313__________jvmtiExtensionEvent callback = env->ext_callbacks()->ClassUnload; 1314__________if (callback != NULL) { 1315____________(*callback)(env->jvmti_external(), jni_env, jt, jk); We got a valid JvmtiEnv from our list walk, but we're passing some random value as the "jni_env". 1316__________} 1317 1318__________assert(real_thread->thread_state() == _thread_in_native, 1319_________________"JavaThread should be in native"); Well we set the "_thread_in_native" value on line 1311 above, but since a ConcurrentGCThread doesn't have that field, there's no guarantee that real code that uses that memory location hasn't updated it... 1320__________real_thread->set_thread_state(prev_state); OK, so we're restoring the JavaThreadState value that we saved on line 1307. However, a ConcurrentGCThread doesn't have that field so there's no guarantee that we didn't just overwrite some value that the real code that uses that memory location just set... 1321 1322__________JNIHandles::destroy_local(jk); 1323__________JNIHandles::destroy_local(jt); 1324________} 1325______} 1326____} 1327__}
30-01-2017