JDK-8319935 : Ensure only one JvmtiThreadState is created for one JavaThread associated with attached native thread
  • Type: Bug
  • Component: hotspot
  • Sub-Component: jvmti
  • Affected Version: 21,22
  • Priority: P4
  • Status: New
  • Resolution: Unresolved
  • OS: generic
  • CPU: generic
  • Submitted: 2023-11-10
  • Updated: 2023-11-14
Related Reports
Relates :  
Relates :  
Description
We ran into issues when there are more than one JvmtiThreadState is created for a JavaThread. The problem can lead to memory corruption, which then can manifest into different crashes in completely unrelated area at a later point and make the problem difficult to be diagnosed.

Following is duplicated from https://bugs.openjdk.org/browse/JDK-8312174?focusedId=14625080&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14625080:

The crashes are always due to a bad JavaThread pointer in the current stable ThreadsList::_threads array. Depending on when the corrupted pointer is visited by the VM, it exhibits various different crashes during SafepointSynchronize::synchronize_threads, or ThreadsSMRSupport::free_list, etc. The crashes are latent symptoms caused by random (but not completely random) memory corruption originated from JvmtiEventControllerPrivate::recompute_thread_enabled due to a stale JavaThead (is already exited) referenced in a JvmtiThreadState. For a particular test that I debugged with, the memory corruption always occurs at the 193-th element (a 64-bit word) in the ThreadsList::_threads. All other JavaThread pointers in the array are intact.

In one of the failed instances that I investigated, the memory address being trashed was 0x00007f9dbc0018b8. The bogus value contained in the memory location was 0x00000000e00f4df0. The debugging information showed there was a JavaThread allocated at 0x00007f9dbc0012b0 earlier during the test execution. The thread later exited and was destroyed. There were two different JvmtiThreadStates were created as associated with this JavaThread. Since the JavaThread was bound to two different JvmtiThreadStates and only one was destroyed during the thread exit. The remaining JvmtiThreadState then contains an invalid pointer (to 0x00007f9dbc0012b0) after the thread exited. At a later point, an array of JavaThread* ( ThreadsList::_threads) was allocated at 0x00007f9dbc0012b0. 0x00007f9dbc0018b8 was a memory location within the array. It appeared that later during JvmtiEventControllerPrivate::recompute_thread_enabled operation, it caused memory trashing at 0x00007f9dbc0018b8 when it encounters the JvmtiThreadState that contains the stale JavaThread* 0x00007f9dbc0012b0.

Duplicated from https://bugs.openjdk.org/browse/JDK-8312174?focusedId=14625567&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14625567:

The two JvmtiThreadStates for one JavaThread issue that we have observed particularly happens for attached native thread. The first creation of a JvmtiThreadState occurs during JvmtiEventControllerPrivate::thread_started when a thread is being attached and it's creating the Java instance of the Thread. The created JvmtiThreadState has a null _thread_oop_h.

  * frame #0: 0x00007f88791948aa libjvm.so`JvmtiThreadState::JvmtiThreadState(JavaThread*, oopDesc*) [inlined] OopHandle::OopHandle(this=0x00005602b76eb810) at oopHandle.hpp:46:17
    frame #1: 0x00007f88791948aa libjvm.so`JvmtiThreadState::JvmtiThreadState(this=0x00005602b76eb800, thread=0x00005602b7769c10, thread_oop=0x0000000000000000) at jvmtiThreadState.cpp:56:19
    frame #2: 0x00007f8879176d83 libjvm.so`JvmtiEventCollector::setup_jvmti_thread_state() [inlined] JvmtiThreadState::state_for_while_locked(thread=<unavailable>, thread_oop=<unavailable>) at jvmtiThreadState.inline.hpp:98:19
    frame #3: 0x00007f8879176cec libjvm.so`JvmtiEventCollector::setup_jvmti_thread_state() at jvmtiThreadState.inline.hpp:111:13
    frame #4: 0x00007f8879176cc4 libjvm.so`JvmtiEventCollector::setup_jvmti_thread_state(this=0x00007ffcbf7bee98) at jvmtiExport.cpp:2953:29
    frame #5: 0x00007f88791773ee libjvm.so`JvmtiSampledObjectAllocEventCollector::start(this=0x00007ffcbf7bee98) at jvmtiExport.cpp:3146:5
    frame #6: 0x00007f887928a549 libjvm.so`MemAllocator::Allocation::notify_allocation_jvmti_sampler() [inlined] JvmtiSampledObjectAllocEventCollector::JvmtiSampledObjectAllocEventCollector(this=0x00007ffcbf7bee98, should_start=true) at jvmtiExport.hpp:560:5
    frame #7: 0x00007f887928a52b libjvm.so`MemAllocator::Allocation::notify_allocation_jvmti_sampler(this=0x00007ffcbf7bef40) at memAllocator.cpp:185:43
    frame #8: 0x00007f887928a94a libjvm.so`MemAllocator::Allocation::notify_allocation(this=<unavailable>, thread=<unavailable>) at memAllocator.cpp:235:3 [artificial]
    frame #9: 0x00007f887928aec9 libjvm.so`MemAllocator::allocate() const [inlined] MemAllocator::Allocation::~Allocation(this=0x00007ffcbf7bef40) at memAllocator.cpp:85:7
    frame #10: 0x00007f887928aeb3 libjvm.so`MemAllocator::allocate(this=0x00007ffcbf7bef90) const at memAllocator.cpp:375:3
    frame #11: 0x00007f8878f2f221 libjvm.so`InstanceKlass::allocate_instance_handle(JavaThread*) [inlined] CollectedHeap::obj_allocate(this=<unavailable>, klass=<unavailable>, size=<unavailable>, __the_thread__=0x00005602b7769c10) at collectedHeap.inline.hpp:36:20
    frame #12: 0x00007f8878f2f1fd libjvm.so`InstanceKlass::allocate_instance_handle(JavaThread*) [inlined] InstanceKlass::allocate_instance(this=<unavailable>, __the_thread__=0x00005602b7769c10) at instanceKlass.cpp:1442:38
    frame #13: 0x00007f8878f2f1ee libjvm.so`InstanceKlass::allocate_instance_handle(this=<unavailable>, __the_thread__=<unavailable>) at instanceKlass.cpp:1462:33
    frame #14: 0x00007f8878f6bff9 libjvm.so`JavaThread::allocate_threadObj(this=0x00005602b7769c10, thread_group=Handle @ r13, thread_name=<unavailable>, daemon=false, __the_thread__=<unavailable>) at javaThread.cpp:222:35
    frame #15: 0x00007f8879000020 libjvm.so`attach_current_thread(vm=<unavailable>, penv=0x00007ffcbf7bf1c0, _args=<unavailable>, daemon=<unavailable>) at jni.cpp:3819:13
...

The second creation of JvmtiThreadState for the same JavaThread (for the attaching native thread) occurs during JvmtiExport::post_thread_start when attach:

  * frame #0: 0x00007f88791948aa libjvm.so`JvmtiThreadState::JvmtiThreadState(JavaThread*, oopDesc*) [inlined] OopHandle::OopHandle(this=0x00005602b76eb910) at oopHandle.hpp:46:17
    frame #1: 0x00007f88791948aa libjvm.so`JvmtiThreadState::JvmtiThreadState(this=0x00005602b76eb900, thread=0x00005602b7769c10, thread_oop=0x0000000116000000) at jvmtiThreadState.cpp:56:19
    frame #2: 0x00007f8879166936 libjvm.so`JvmtiEventControllerPrivate::thread_started(JavaThread*) [inlined] JvmtiThreadState::state_for_while_locked(thread=0x00005602b7769c10, thread_oop=<unavailable>) at jvmtiThreadState.inline.hpp:98:19
    frame #3: 0x00007f88791668b3 libjvm.so`JvmtiEventControllerPrivate::thread_started(thread=0x00005602b7769c10) at jvmtiEventController.cpp:744:31
    frame #4: 0x00007f887916c1f4 libjvm.so`JvmtiExport::post_thread_start(thread=0x00005602b7769c10) at jvmtiExport.cpp:1476:3
    frame #5: 0x00007f88790000af libjvm.so`attach_current_thread(vm=<unavailable>, penv=0x00007ffcbf7bf1c0, _args=<unavailable>, daemon=<unavailable>) at jni.cpp:3849:5
...

Before the https://git.openjdk.org/jdk/commit/fda142ff6cfefa12ec1ea4d4eb48b3c1b285bc04 change, JvmtiEventControllerPrivate::thread_started calls JvmtiThreadState::state_for_while_locked directly, which actually finds the already created JvmtiThreadState from the JavaThread. However the 'if (state == nullptr || state->get_thread_oop() != thread_oop)' check fails since the thread_oop from the state is null and is not the same as the actual thread_oop. Since the check fails, it decides to create a new JvmtiThreadState for the JavaThread.

https://git.openjdk.org/jdk/commit/fda142ff6cfefa12ec1ea4d4eb48b3c1b285bc04 changed to call JvmtiThreadState::state_for(thread) in JvmtiEventControllerPrivate::thread_started. JvmtiThreadState::state_for(thread) gets the JvmtiThreadState from the JavaThread and just returns. Hence there's no additional state being created here.

https://git.openjdk.org/jdk/commit/fda142ff6cfefa12ec1ea4d4eb48b3c1b285bc04 resolved the issue by luck, I think we should put additional reinforcement. Hence filing this bug.
Comments
Add backed the 'loom' label since this bug was introduced by the Loom integration (JDK-8284161).
13-11-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/16642 Date: 2023-11-13 21:52:22 +0000
13-11-2023

Besides the above three, there is Javathread::_jvmti_vthread, which can be bound to a JvmtiThreadState. Fortunately, I think the case that we are dealing here does not need to involve virtual thread. It's specifically related to attached native threads. That's simpler to handle. Looking closely of JvmtiThreadState related code, I see JvmtiThreadState::_thread_oop_h never changes after a JvmtiThreadState is initialized. Places where JvmtiThreadState::_thread_oop_h used are for virtual thread handling. To ensure that we don't create duplicate JvmtiThreadState for JavaThread related to attached native, we need to teach JvmtiThreadState::state_for_while_locked about the null state->get_thread_oop() is possible (if the JvmtiThreadState is created while the Java thread oop for the attached native thread is allocated) and returns the thread->jvmti_thread_state() in that case. Here is my proposed change. I think that would make the direct JvmtiThreadState::state_for_while_locked calls safer. +++ b/src/hotspot/share/prims/jvmtiThreadState.inline.hpp @@ -78,10 +78,6 @@ inline JvmtiThreadState* JvmtiThreadState::state_for_while_locked(JavaThread *th NoSafepointVerifier nsv; // oop is safe to use. - if (thread_oop == nullptr) { // Then thread should not be null (see assert above). - thread_oop = thread->jvmti_vthread() != nullptr ? thread->jvmti_vthread() : thread->threadObj(); - } - // In a case of unmounted virtual thread the thread can be null. JvmtiThreadState *state = thread == nullptr ? nullptr : thread->jvmti_thread_state(); @@ -89,7 +85,15 @@ inline JvmtiThreadState* JvmtiThreadState::state_for_while_locked(JavaThread *th // Don't add a JvmtiThreadState to a thread that is exiting. return nullptr; } - if (state == nullptr || state->get_thread_oop() != thread_oop) { + + if (thread_oop == nullptr) { // Then thread should not be null (see assert above). + thread_oop = thread->jvmti_vthread() != nullptr ? thread->jvmti_vthread() : thread->threadObj(); + } + // The state->get_thread_oop() may be null if the state is created during + // the allocation of the thread oop when a native thread is attaching. Make + // sure we don't create a new state for the JavaThread. + if (state == nullptr ||(state->get_thread_oop() != nullptr && + state->get_thread_oop() != thread_oop)) { // Check if java_lang_Thread already has a link to the JvmtiThreadState. if (thread_oop != nullptr) { // thread_oop can be null during early VMStart. state = java_lang_Thread::jvmti_thread_state(thread_oop); @@ -98,6 +102,7 @@ inline JvmtiThreadState* JvmtiThreadState::state_for_while_locked(JavaThread *th state = new JvmtiThreadState(thread, thread_oop); } } + assert(state != nullptr, "sanity check"); return state; }
13-11-2023

Also, things get more complicated when virtual threads are involved. I'm not 100% sure if we still could maintain the 1:1 mapping between JvmtiThreadState and JavaThread when virtual threads are used. I wrote the example from https://openjdk.org/jeps/425 (10000 virtual threads running Thread.sleep()), then the JvmtiThreadState::_head linked list will contain multiple JvmtiThreadState instances pointing to the same JavaThread. It still runs fine. The relation among these 3 is quite confusing now: JvmtiThreadState java.lang.Thread HotSpot JavaThread They each contain two pointers pointing to the other two. Could anyone help explain the assumptions and mapping relation among these 3?
11-11-2023

Notes: 1. This problem likely only manifests when heap sampling (JVMTI SampledObjectAlloc) is enabled. The first instance of JvmtiThreadState from JavaThread::allocate_threadObj() is in preparation for sampling the allocation of java.lang.Thread object. The stack trace above confirms this. 2. The root cause is the condition `if (state == nullptr || state->get_thread_oop() != thread_oop)` from `JvmtiThreadState::state_for_while_locked()`. It used to be `if (state == nullptr) ` before JDK-8284161. This problem was introduced by JDK-8284161. 3. There might still be a problem after the accidental fix from JDK-8312174. The JvmtiThreadState::_thread_oop_h handle will remain nullptr for those threads whose java.lang.Thread object got sampled by SampledObjectAlloc. Ideally _thread_oop_h should point to a valid java.lang.Thread object for the JavaThread.
11-11-2023