JDK-8217755 : SuspendThread might hang expecting lock in JavaThread::is_ext_suspend_completed
  • Type: Bug
  • Component: hotspot
  • Sub-Component: jvmti
  • Affected Version: 13
  • Priority: P3
  • Status: Resolved
  • Resolution: Duplicate
  • Submitted: 2019-01-24
  • Updated: 2020-05-11
  • Resolved: 2020-05-11
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 15
15Resolved
Related Reports
Duplicate :  
Description
Stress test hangs trying to suspend a thread in
Thread 74 (Thread 0x2af30b639700 (LWP 22508)):
#0  0x00002af2ae69a945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002af2afe89a85 in os::PlatformEvent::park (this=0x2af344003800) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/os/posix/os_posix.cpp:1950
#2  0x00002af2afe079c7 in ParkCommon (ev=0x2af344003800, timo=0) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/mutex.cpp:399
#3  0x00002af2afe07bda in Monitor::ILock (this=0x2af2b403bde0, Self=0x2af344002000) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/mutex.cpp:461
#4  0x00002af2afe09087 in Monitor::lock_without_safepoint_check (this=0x2af2b403bde0, Self=0x2af344002000) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/mutex.cpp:932
#5  0x00002af2afe09123 in Monitor::lock_without_safepoint_check (this=0x2af2b403bde0) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/mutex.cpp:938
#6  0x00002af2aff65faf in SafepointSynchronize::block (thread=0x2af344002000) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/safepoint.cpp:890
#7  0x00002af2af418928 in SafepointMechanism::block_if_requested_local_poll (thread=0x2af344002000) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/safepointMechanism.inline.hpp:63
#8  0x00002af2af418969 in SafepointMechanism::block_if_requested (thread=0x2af344002000) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/safepointMechanism.inline.hpp:73
#9  0x00002af2af418b84 in ThreadStateTransition::transition_and_fence (thread=0x2af344002000, from=_thread_blocked, to=_thread_in_vm) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/interfaceSupport.inline.hpp:128
#10 0x00002af2af418bc2 in ThreadStateTransition::trans_and_fence (this=0x2af30b638780, from=_thread_blocked, to=_thread_in_vm) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/interfaceSupport.inline.hpp:166
#11 0x00002af2af418c5e in ThreadBlockInVM::~ThreadBlockInVM (this=0x2af30b638780, __in_chrg=<optimized out>) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/interfaceSupport.inline.hpp:284
#12 0x00002af2afe09de1 in Monitor::wait (this=0x2af2b458f9e0, no_safepoint_check=false, timeout=110, as_suspend_equivalent=false) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/mutex.cpp:1097
#13 0x00002af2b0072b89 in JavaThread::is_ext_suspend_completed (this=0x2af2b458e800, called_by_wait=false, delay=5, bits=0x2af30b63886c) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/thread.cpp:722
#14 0x00002af2b0077e7b in JavaThread::java_suspend (this=0x2af2b458e800) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/thread.cpp:2389
#15 0x00002af2afc48fc6 in JvmtiSuspendControl::suspend (java_thread=0x2af2b458e800) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/prims/jvmtiImpl.cpp:863
#16 0x00002af2afc23c70 in JvmtiEnv::SuspendThread (this=0x2af2b4031170, java_thread=0x2af2b458e800) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/prims/jvmtiEnv.cpp:955
#17 0x00002af2afbc454b in jvmti_SuspendThread (env=0x2af2b4031178, thread=0x2af35800bc08) at /scratch/lmesnik/ws/ks-jvmti/build/linux-x64/hotspot/variant-server/gensrc/jvmtifiles/jvmtiEnter.cpp:528
#18 0x00002af2b1885820 in agent_sampler (jvmti=0x2af2b4031178, env=0x2af344002390, p=0x0) at /scratch/lmesnik/ws/ks-jvmti/closed/test/hotspot/jtreg/applications/kitchensink/process/stress/modules/libJvmtiStressModule.c:282
#19 0x00002af2afc46481 in JvmtiAgentThread::call_start_function (this=0x2af344002000) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/prims/jvmtiImpl.cpp:85
#20 0x00002af2afc46416 in JvmtiAgentThread::start_function_wrapper (thread=0x2af344002000, __the_thread__=0x2af344002000) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/prims/jvmtiImpl.cpp:79
#21 0x00002af2b00762e3 in JavaThread::thread_main_inner (this=0x2af344002000) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/thread.cpp:1870
#22 0x00002af2b007615a in JavaThread::run (this=0x2af344002000) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/thread.cpp:1853
#23 0x00002af2b007221f in Thread::call_run (this=0x2af344002000) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/thread.cpp:400
#24 0x00002af2afe72204 in thread_native_entry (thread=0x2af344002000) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/os/linux/os_linux.cpp:712
#25 0x00002af2ae696e25 in start_thread () from /lib64/libpthread.so.0
#26 0x00002af2aeba734d in clone () from /lib64/libc.so.6
Comments
Closing this bug as a dup of: JDK-8210832 Remove sneaky locking in class Monitor .
11-05-2020

I was puzzled a little bit because current implementation of the Monitor::wait() has ThreadBlockInVMWithDeadlockCheck instead of ThreadBlockInVM. As we see the ThreadBlockInVM::~ThreadBlockInVM() is executed on the Thread #74. It means, the Monitor::wait() has already been re-acquired the SR_lock hence we see a deadlock with VMThread (Thread #102) executing the VM_ThreadDump::doit() and blocked on the SR_lock.lock(). The Thread #74 is waiting for VMThread to continue its work. Now, I have to make sure this bug is still reproducible or find recent failures with this failure mode. At least, I need some instructions on how to run the Kitchensik to reproduce this issue. It looks like the ThreadBlockInVMWithDeadlockCheck helper was introduced to avoid such deadlocks in the fix of (As Dan already pointed out in one of previous comments): JDK-8210832: Remove sneaky locking in class Monitor Summary: Removed sneaky locking and simplified vm monitors implementation Reviewed-by: rehn, dcubed, pliden, dholmes, coleenp Contributed-by: david.holmes@oracle.com, patricio.chilano.mateo@oracle.com I'm going to close this bug as a dup of JDK-8210832 after verification that Kitchensink does not fail anymore.
09-05-2020

I do not see any problems with SuspendThread itself yet. 1. The SuspenThread is called in a loop of the libJvmtiStressModule.c::agent_sampler thread (Thread #74). The target thread to suspend is Thread #87 (see below). The JavaThread::is_ext_suspend_completed is waiting in the SR_lock()->wait(timeout). While in the Monitor::wait it is blocked in the SafepointSynchronize::block(): Monitor::wait (this=0x2af2b458f9e0, no_safepoint_check=false, timeout=110, as_suspend_equivalent=false) 2. The thread #87 is the JniStressModule thread (which the agent_sampler thread is waiting in JavaThread::is_ext_suspend_completed). It is executing the native method libJniStressModule.c::Java_applications_kitchensink_process_stress_modules_JniStressModule_getStaticMethodID. The thread #87 is blocked in the JavaThread::java_suspend_self() as required. It is waiting on the SR_lock indefinitely: Monitor::wait (this=0x2af2b458f9e0, no_safepoint_check=true, timeout=0, as_suspend_equivalent=false) Its thread_state() seems to be _thread_in_native_trans as expected by the JavaThread::is_ext_suspend_completed and would be discovered by the thread #74 after exiting from the safepoint. It is waiting as follows: MonitorLocker ml(SR_lock(), Mutex::_no_safepoint_check_flag); . . . while (is_external_suspend()) { ret++; this->set_ext_suspended(); // _ext_suspended flag is cleared by java_resume() while (is_ext_suspended()) { ml.wait(); } } 3. There is another interesting thread involved which is thread #102 which is the VMThread. It is currently executing the VM_ThreadDump::doit() operation. It is in process of doing ThreadSnapshot::initialize for Thread #74 and makes a call to JavaThread::is_being_ext_suspended for Thread #74. This function has been blocked on the attempt to grap the SR_lock: Monitor::ILock (this=0x2af2b458f9e0, Self=0x2af2b435c800) at the line: MutexLocker ml(SR_lock(), Mutex::_no_safepoint_check_flag); It is dead-locked as described in the comment below: // Lock without safepoint check - a degenerate variant of lock() for use by // JavaThreads when it is known to be safe to not check for a safepoint when // acquiring this lock. If the thread blocks acquiring the lock it is not // safepoint-safe and so will prevent a safepoint from being reached. If used // in the wrong way this can lead to a deadlock with the safepoint code. void Mutex::lock_without_safepoint_check(Thread * self) { check_no_safepoint_state(self); assert(_owner != self, "invariant"); _lock.lock(); assert_owner(NULL); set_owner(self); } It is not clear what thread owns the SR_lock. At least, I do not see the SR_lock address (this=0x2af2b458f9e0) in the stack traces other than for threads: Thread #74 (waiting on SR_lock) Thread #87 (waiting on SR_lock) Thread #102 (blocked on SR_lock)
09-05-2020

Thanks, David! I'll check it.
08-05-2020

One question to answer is whether the SR_Lock->wait blocks in SafepointSynchronize::block() before or after it releases the SR_lock. I'm assuming it is before - which is what can lead to the deadlock. If we check for safepoint only after releasing the lock when doing the actual low-level wait() then a deadlock should not be possible.
08-05-2020

Correction to previous comment. Sneaky locking removal was pushed here: URL: http://hg.openjdk.java.net/jdk/jdk/rev/043ae846819f User: coleenp Date: 2019-02-05 20:17:15 +0000 so that's 2019.02.05 and not 2018.09.17.
24-01-2020

Sneaking locking removal was done by JDK-8210832 on 2018-09-17.
17-12-2019

I think Erik O recently pointed out a potential problem with different code taking the SR_lock with and without safepoint checks. I can't quite piece together the deadlock paths completely here, but it seems like the forthcoming "sneaking locking removal" changes will fix this.
25-01-2019

It looks like we wait on SR_lock with safepoint-checks enabled. When the thread wakes up and tries to take a safepoint, we try to lock Threads_lock. However, other code like JavaThread::java_resume() locks Threads_lock first and then SR_lock, so this looks like a deadlock and might even cause an assert in a debug build.
25-01-2019

Targeted this to 13 for now.
24-01-2019

Also, it might be deadlock with JNI thread: Thread 87 (Thread 0x2af30a92c700 (LWP 22494)): #0 0x00002af2ae69a945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002af2afe89a85 in os::PlatformEvent::park (this=0x2af2b458ff00) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/os/posix/os_posix.cpp:1950 #2 0x00002af2afe079c7 in ParkCommon (ev=0x2af2b458ff00, timo=0) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/mutex.cpp:399 #3 0x00002af2afe08756 in Monitor::IWait (this=0x2af2b458f9e0, Self=0x2af2b458e800, timo=0) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/mutex.cpp:768 #4 0x00002af2afe09bc2 in Monitor::wait (this=0x2af2b458f9e0, no_safepoint_check=true, timeout=0, as_suspend_equivalent=false) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/mutex.cpp:1091 #5 0x00002af2b00780b9 in JavaThread::java_suspend_self (this=0x2af2b458e800) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/thread.cpp:2453 #6 0x00002af2b007798c in JavaThread::handle_special_runtime_exit_condition (this=0x2af2b458e800, check_asyncs=false) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/thread.cpp:2300 #7 0x00002af2aff6606a in SafepointSynchronize::block (thread=0x2af2b458e800) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/safepoint.cpp:921 #8 0x00002af2af418928 in SafepointMechanism::block_if_requested_local_poll (thread=0x2af2b458e800) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/safepointMechanism.inline.hpp:63 #9 0x00002af2af418969 in SafepointMechanism::block_if_requested (thread=0x2af2b458e800) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/safepointMechanism.inline.hpp:73 #10 0x00002af2b007831c in JavaThread::check_safepoint_and_suspend_for_native_trans (thread=0x2af2b458e800) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/thread.cpp:2516 #11 0x00002af2af4a1cc4 in ThreadStateTransition::transition_from_native (thread=0x2af2b458e800, to=_thread_in_vm) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/interfaceSupport.inline.hpp:154 #12 0x00002af2af4a1cfc in ThreadStateTransition::trans_from_native (this=0x2af30a92b500, to=_thread_in_vm) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/interfaceSupport.inline.hpp:165 #13 0x00002af2af4a1d42 in ThreadInVMfromNative::ThreadInVMfromNative (this=0x2af30a92b500, thread=0x2af2b458e800) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/interfaceSupport.inline.hpp:247 #14 0x00002af2afabf8d8 in jni_GetStaticFieldID (env=0x2af2b458eb90, clazz=0x2af2b436fd70, name=0x2af30b63c5db "staticField", sig=0x2af30b63c5cb "I") at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/prims/jni.cpp:2245 #15 0x00002af30b63bb93 in Java_applications_kitchensink_process_stress_modules_JniStressModule_getStaticFieldID (env=0x2af2b458eb90, this=0x2af30a92b5d0) at /scratch/lmesnik/ws/ks-jvmti/closed/test/hotspot/jtreg/applications/kitchensink/process/stress/modules/libJniStressModule.c:209 #16 0x00002af2c7fb6920 in ?? () #17 0x000000063d263510 in ?? () #18 0x000000063d263510 in ?? () #19 0x00002af2b0dd9760 in vtable for ThreadInVMfromJava () from /scratch/lmesnik/ws/ks-jvmti/build/linux-x64/images/jdk/lib/server/libjvm.so #20 0x00002af2b458e800 in ?? () #21 0x00000007f7d8e188 in ?? () #22 0x000000063d2af180 in ?? () #23 0x0000000000000000 in ?? ()
24-01-2019

The complete native stack trace is attached. Another possible interest thread is Thread 75 (Thread 0x2af30b538700 (LWP 22506)): #0 0x00002af2ae69a945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002af2afe89a85 in os::PlatformEvent::park (this=0x2af2b45aed00) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/os/posix/os_posix.cpp:1950 #2 0x00002af2afe079c7 in ParkCommon (ev=0x2af2b45aed00, timo=0) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/mutex.cpp:399 #3 0x00002af2afe08756 in Monitor::IWait (this=0x2af2b403c290, Self=0x2af2b45ad000, timo=0) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/mutex.cpp:768 #4 0x00002af2afe09ca2 in Monitor::wait (this=0x2af2b403c290, no_safepoint_check=false, timeout=0, as_suspend_equivalent=false) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/mutex.cpp:1106 #5 0x00002af2b01404ca in VMThread::execute (op=0x2af30b5371e0) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/runtime/vmThread.cpp:709 #6 0x00002af2afd7367a in do_thread_dump (dump_result=0x2af30b537370, ids_ah=..., num_threads=86, max_depth=-1, with_locked_monitors=false, with_locked_synchronizers=false, __the_thread__=0x2af2b45ad000) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/services/m\ anagement.cpp:1039 #7 0x00002af2afd73b93 in jmm_GetThreadInfo (env=0x2af2b45ad390, ids=0x2af30b537430, maxDepth=-1, infoArray=0x2af30b537440) at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/share/services/management.cpp:1112 #8 0x00002af2e3a97b23 in Java_sun_management_ThreadImpl_getThreadInfo1 (env=0x2af2b45ad390, cls=0x2af30b537460, ids=0x2af30b537430, maxDepth=-1, infoArray=0x2af30b537440) at /scratch/lmesnik/ws/ks-jvmti/open/src/java.management/share/native/libmanagement/ThreadImpl.c:\ 57 #9 0x00002af2c85d5ad6 in ?? () #10 0x00000007f77f3bd0 in ?? () #11 0x000000070000004a in ?? () #12 0x00000007f77f3e90 in ?? () #13 0x00002af2afe7354d in os::javaTimeMillis () at /scratch/lmesnik/ws/ks-jvmti/open/src/hotspot/os/linux/os_linux.cpp:1219 #14 0x00002af2c8ef01c4 in ?? () #15 0x0000000600000056 in ?? () #16 0x000000063d264270 in ?? () #17 0x000000063d30bd90 in ?? () #18 0x000000064ea1d9a8 in ?? () #19 0x0000000100000000 in ?? () #20 0x00000007f77f3e90 in ?? () #21 0x0000000000000001 in ?? () #22 0x0000000000000031 in ?? () #23 0x000000000000004c in ?? () #24 0xfeefe6cac7a4c895 in ?? () #25 0x000000063d30bd90 in ?? () #26 0x00000007f77f3bd0 in ?? () #27 0x000000063d264270 in ?? () #28 0x00000007f77f3560 in ?? () #29 0x0000000800392d88 in ?? () #30 0x00000007c7a4c895 in ?? () #31 0x000000000000003c in ?? () #32 0x000000063d2ce5c0 in ?? () #33 0x00000168788eb431 in ?? () #34 0x0000000000000000 in ?? ()
24-01-2019