JDK-8266593 : vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be"
  • Type: Bug
  • Component: hotspot
  • Sub-Component: jvmti
  • Affected Version: 17
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux_ubuntu
  • CPU: x86_64
  • Submitted: 2021-05-05
  • Updated: 2023-10-17
  • Resolved: 2021-11-22
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17 JDK 18
17.0.10-oracleFixed 18 b25Fixed
Related Reports
Relates :  
Description
The following test:

vmTestbase/nsk/jvmti/PopFrame/popframe011/TestDescription.java

can fail with:

#  Internal Error (/work/shared/bug_hunt/thread_SMR_stress/jdk17_exp.git/open/src/hotspot/share/prims/jvmtiEnvBase.cpp:1533), pid=15364, tid=4442
#  assert(java_thread == _state->get_thread()) failed: Must be
#
# JRE version: Java(TM) SE Runtime Environment (17.0) (fastdebug build 17-internal+0-LTS-2021-04-30-1905344.dcubed...)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 17-internal+0-LTS-2021-04-30-1905344.dcubed..., mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x11dcac8]  UpdateForPopTopFrameClosure::doit(Thread*, bool)+0xa8


Here's a snippet from the log file:

#section:main
----------messages:(4/267)----------
command: main -agentlib:popframe011 nsk.jvmti.PopFrame.popframe011 6600
reason: User specified action: run main/othervm/native/timeout=6900 -agentlib:popframe011 nsk.jvmti.PopFrame.popframe011 6600
Mode: othervm [/othervm specified]
elapsed time (seconds): 3363.522
----------configuration:(0/0)----------
----------System.out:(22/1610)----------
About to execute for 6600 seconds.
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/jvmtiEnvBase.cpp:1533
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/work/shared/bug_hunt/thread_SMR_stress/jdk17_exp.git/open/src/hotspot/share/prims/jvmtiEnvBase.cpp:1533), pid=15364, tid=4442
#  assert(java_thread == _state->get_thread()) failed: Must be
#
# JRE version: Java(TM) SE Runtime Environment (17.0) (fastdebug build 17-internal+0-LTS-2021-04-30-1905344.dcubed...)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 17-internal+0-LTS-2021-04-30-1905344.dcubed..., mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x11dcac8]  UpdateForPopTopFrameClosure::doit(Thread*, bool)+0xa8
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /work/shared/bug_hunt/thread_SMR_stress/jdk17_exp.git/build/linux-x86_64-normal-server-fastdebug/test-support/jtreg_open_test_hotspot_jtreg_StressWrapper_popframe011_java/StressWrapper_popframe011/core.15364)
#
# An error report file with more information is saved as:
# /work/shared/bug_hunt/thread_SMR_stress/jdk17_exp.git/build/linux-x86_64-normal-server-fastdebug/test-support/jtreg_open_test_hotspot_jtreg_StressWrapper_popframe011_java/StressWrapper_popframe011/hs_err_pid15364.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#
----------System.err:(0/0)----------
----------rerun:(33/5020)*----------

Here's the crashing thread's stack:

---------------  T H R E A D  ---------------

Current thread (0x00002aca5c0308f0):  JavaThread "Thread-9532887" [_thread_in_vm, id=4442, stack(0x00002acaa4606000,0x00002acaa4707000)]

Stack: [0x00002acaa4606000,0x00002acaa4707000],  sp=0x00002acaa4704630,  free space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x11dcac8]  UpdateForPopTopFrameClosure::doit(Thread*, bool)+0xa8
V  [libjvm.so+0xd8ddf0]  HandshakeOperation::do_handshake(JavaThread*)+0xf0
V  [libjvm.so+0xd8df81]  HandshakeState::process_self_inner()+0x141
V  [libjvm.so+0xd8e2cd]  HandshakeState::process_by_self()+0x19d
V  [libjvm.so+0x16ff118]  SafepointMechanism::process_if_requested_slow(JavaThread*)+0x28
V  [libjvm.so+0x14f6c18]  Mutex::lock_contended(Thread*)+0x548
V  [libjvm.so+0x14f6de4]  Mutex::lock(Thread*)+0xf4
V  [libjvm.so+0x18ba1c8]  Threads::remove(JavaThread*, bool)+0x38
V  [libjvm.so+0x18bd08e]  JavaThread::exit(bool, JavaThread::ExitType)+0x89e
V  [libjvm.so+0x18bd5b9]  JavaThread::post_run()+0x19
V  [libjvm.so+0x18bc65d]  Thread::call_run()+0x16d
V  [libjvm.so+0x159387e]  thread_native_entry(Thread*)+0x10e

Please note this this failure is happening with the latest version
of the popframe011 test that I'm developing via:

    JDK-8266130 Convert Thread-SMR stress tests from counter based to time based

My updates in JDK-8266130 only change the duration of the
test execution and don't change the popframe logic at all, i.e.,
run the same test more times and it will eventually crash.
Comments
Fix request [17u] I backport this for parity with 17.0.10-oracle. Small risk, servicability tool. Simple resolve due to context. Test passes. SAP nightly testing passed.
10-10-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk17u-dev/pull/1862 Date: 2023-10-09 17:47:26 +0000
09-10-2023

Changeset: 32839ba0 Author: Serguei Spitsyn <sspitsyn@openjdk.org> Date: 2021-11-22 10:47:47 +0000 URL: https://git.openjdk.java.net/jdk/commit/32839ba012f0a0a66e249cd8d12b94499d82ec0a
22-11-2021

The suggested fix is: diff --git a/src/hotspot/share/prims/jvmtiEnvBase.cpp b/src/hotspot/share/prims/jvmtiEnvBase.cpp index e6a9409b7ef..d4f5c5d1a9e 100644 --- a/src/hotspot/share/prims/jvmtiEnvBase.cpp +++ b/src/hotspot/share/prims/jvmtiEnvBase.cpp @@ -1393,6 +1393,9 @@ SetForceEarlyReturn::doit(Thread *target, bool self) { Thread* current_thread = Thread::current(); HandleMark hm(current_thread); + if (java_thread->is_exiting() || java_thread->threadObj() == NULL) { + return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */ + } if (!self) { if (!java_thread->is_suspended()) { _result = JVMTI_ERROR_THREAD_NOT_SUSPENDED; @@ -1523,6 +1526,10 @@ UpdateForPopTopFrameClosure::doit(Thread *target, bool self) { Thread* current_thread = Thread::current(); HandleMark hm(current_thread); JavaThread* java_thread = JavaThread::cast(target); + + if (java_thread->is_exiting() || java_thread->threadObj() == NULL) { + return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */ + } assert(java_thread == _state->get_thread(), "Must be"); if (!self && !java_thread->is_suspended()) { @@ -1599,14 +1606,12 @@ UpdateForPopTopFrameClosure::doit(Thread *target, bool self) { // It's fine to update the thread state here because no JVMTI events // shall be posted for this PopFrame. - if (!java_thread->is_exiting() && java_thread->threadObj() != NULL) { - _state->update_for_pop_top_frame(); - java_thread->set_popframe_condition(JavaThread::popframe_pending_bit); - // Set pending step flag for this popframe and it is cleared when next - // step event is posted. - _state->set_pending_step_for_popframe(); - _result = JVMTI_ERROR_NONE; - } + _state->update_for_pop_top_frame(); + java_thread->set_popframe_condition(JavaThread::popframe_pending_bit); + // Set pending step flag for this popframe and it is cleared when next + // step event is posted. + _state->set_pending_step_for_popframe(); + _result = JVMTI_ERROR_NONE; } void @@ -1614,6 +1619,9 @@ SetFramePopClosure::doit(Thread *target, bool self) { ResourceMark rm; JavaThread* java_thread = JavaThread::cast(target); + if (java_thread->is_exiting() || java_thread->threadObj() == NULL) { + return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */ + } assert(_state->get_thread() == java_thread, "Must be"); if (!self && !java_thread->is_suspended()) { @@ -1633,9 +1641,6 @@ SetFramePopClosure::doit(Thread *target, bool self) { } assert(vf->frame_pointer() != NULL, "frame pointer mustn't be NULL"); - if (java_thread->is_exiting() || java_thread->threadObj() == NULL) { - return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */ - } int frame_number = _state->count_frames() - _depth; _state->env_thread_state((JvmtiEnvBase*)_env)->set_frame_pop(frame_number); _result = JVMTI_ERROR_NONE;
17-11-2021

Agreed with the analysis from Robbin. Reproduced this issue with my tracing and got this printed: DBG: JvmtiExport::cleanup_thread: 0x7f3e380303a0 DBG: UpdateForPopTopFrameClosure::doit: java_thread: 0x7f3e380303a0 _state->get_thread(): 0xbabababababababa So, we get this assert in UpdateForPopTopFrameClosure::doit after JvmtiExport::cleanup_thread for this java_thread.
16-11-2021

I think all targetted handshakes on JavaThread should be predicated on not-exiting (and by implication certainly not terminated). Only global handshake operations may need to catch a JavaThread whilst still in the process of being torn down and terminating. I'm not clear on the intent behind the threadObj() check though. There is a very small window during JNI attach that a thread can handshake but not yet have a threadObj() but we have the "is_attaching_via_jni" flag to filter those out of thread operations. We no longer null out the threadObj() in ensure_join() but leave that for the JavaThread destructor.
22-06-2021

In JavaThread::exit() ### if (JvmtiEnv::environments_might_exist()) { JvmtiExport::cleanup_thread(this); // <== frees JVM Ti state } ... // Remove from list of active threads list, and notify VM thread if we are the last non-daemon thread Threads::remove(this, daemon); // <== Executes handshake which reads JVM TI state ### Simplified: - When a JavaThread is "is_exiting()" we are about to tear down the Java side of the JavaThread*. - When a JavaThread is "is_exiting()" and java_lang_Thread::thread(thread->threadObj()) returns NULL the object is disconnected from native side. And vice versa: the native JavaThread* no longer have a representation in Java. (technically we can still reach our object) - When a JavaThread is "is_terminated()" the native JavaThread* is no longer usable. Handshakes are performed until terminated, since some is only working on the native side. But it also means that other Handshakes may need to add their owns checks. I think all derivatives of JvmtiHandshakeClosure should have the proper checks added: "if (jt->is_exiting() && java_lang_Thread::thread(thread->threadObj()) == NULL)" Which means the thread is after ensure_join() (Java-side is dead, e.g. isAlive() == false). Looks like I missed this, or alternative probably me that broke it, if I'm not mistaken. What you think?
11-06-2021

JDK-8266130 convert Thread-SMR stress tests from counter based to time based is integrated in the jdk/jdk repo so the latest version of the test is now available directly from the repo (instead of the PR).
10-06-2021

I've attached the log files for my jdk-17+20 sightings: $ unzip -l jdk-17+20_linux.8266593.zip Archive: jdk-17+20_linux.8266593.zip Length Date Time Name --------- ---------- ----- ---- 90561 2021-05-01 04:25 jdk-17+20_1/failures.linux-x86_64/hs_err_pid15364.log 15959 2021-05-01 04:25 jdk-17+20_1/failures.linux-x86_64/StressWrapper_popframe011.jtr.fastdebug 90171 2021-05-01 04:40 jdk-17+20_1/failures.linux-x86_64/hs_err_pid20671.log 15960 2021-05-01 04:40 jdk-17+20_1/failures.linux-x86_64/StressWrapper_popframe011.jtr.slowdebug 148321 2021-05-01 15:52 jdk-17+20_2/failures.linux-x86_64/hs_err_pid12083.log 16303 2021-05-01 15:52 jdk-17+20_2/failures.linux-x86_64/TestDescription.jtr.fastdebug 95813 2021-05-02 03:34 jdk-17+20_2/failures.linux-x86_64/hs_err_pid14057.log 15960 2021-05-02 03:34 jdk-17+20_2/failures.linux-x86_64/StressWrapper_popframe011.jtr.fastdebug 91440 2021-05-02 03:51 jdk-17+20_2/failures.linux-x86_64/hs_err_pid32239.log 15961 2021-05-02 03:51 jdk-17+20_2/failures.linux-x86_64/StressWrapper_popframe011.jtr.slowdebug --------- ------- 596449 10 files
05-05-2021