JDK-8156500 : Move Reference pending list into VM to prevent deadlocks
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 9
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2016-05-08
  • Updated: 2021-01-10
  • Resolved: 2016-08-31
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9 b137Fixed
Related Reports
Blocks :  
Blocks :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8156550 :  
JDK-8156551 :  
Description
It seems, there a GC issue that is provoked by the stress test
com/sun/jdi/OomDebugTest.java added as new unit test with the fix of:
   https://bugs.openjdk.java.net/browse/JDK-8153711

Below is a copy of the comment from the JDK-8153711 bug report:

Note that I'm still seeing fairly rare cases (1 out of 1000) where OomDebugTest would timeout. The debugger waits for the reply from the debuggee which is in process of fulfilling a newInstance() request of a primitive array. However that reply never seems to come, because for some reason the call to jni_NewByteArray never completes? This looks like an issue in GC to me and not related to this bug. Here is the jstack snippet of the hung OomDebugTestTarg JVM process:

[...]
----------------- 28829 -----------------
0x00007efeb8e9cb10 __pthread_cond_wait + 0xc0
0x00007efeb7f3419f Monitor::IWait(Thread*, long) + 0xef
0x00007efeb7f3541e Monitor::wait(bool, long, bool) + 0x22e
0x00007efeb7fff7aa ReferencePendingListLockerThread::receive_and_handle_messages() + 0x4a
0x00007efeb7fff879 ????????
0x00007efeb80dd227 JavaThread::thread_main_inner() + 0x1e7
0x00007efeb7f6f430 java_start(Thread*) + 0xf0
Locked ownable synchronizers:
    - None
[...]
----------------- 28837 -----------------
0x00007efeb8e9ceb9 __pthread_cond_timedwait + 0x129
0x00007efeb7f56bf3 ObjectMonitor::EnterI(Thread*) + 0x3c3
0x00007efeb7f572c0 ObjectMonitor::enter(Thread*) + 0x320
0x00007efeb7fff3ea ReferencePendingListLocker::lock() + 0xca
0x00007efeb813af3f VM_GC_Operation::doit_prologue() + 0x6f
0x00007efeb8149be1 VM_G1IncCollectionPause::doit_prologue() + 0x11
0x00007efeb814818e VMThread::execute(VM_Operation*) + 0x22e
0x00007efeb7bd4fb5 G1CollectedHeap::collect(GCCause::Cause) + 0x95
0x00007efeb7bdb28a G1CollectedHeap::attempt_allocation_humongous(unsigned long, unsigned int*, unsigned int*) + 0x2ea
0x00007efeb7bdb519 G1CollectedHeap::mem_allocate(unsigned long, bool*) + 0x219
0x00007efeb80f6501 TypeArrayKlass::allocate_common(int, bool, Thread*) + 0xa1
0x00007efeb7ce5713 jni_NewByteArray + 0xa3
0x00007efeb641ab56 newInstance + 0x496
0x00007efeb6428fd8 debugLoop_run + 0x268
0x00007efeb643b075 attachThread + 0x25
0x00007efeb7dbaa00 JvmtiAgentThread::start_function_wrapper(JavaThread*, Thread*) + 0xb0
0x00007efeb80dd227 JavaThread::thread_main_inner() + 0x1e7
0x00007efeb7f6f430 java_start(Thread*) + 0xf0
Locked ownable synchronizers:
    - None

Interestingly enough it always seems to get stuck in test2() which instantiates larg-ish primitive byte arrays in the debuggee. 


I also, observed a similar deadlock with the OomDebugTest test:

Below is a jstack output fragment:

----------------- 9131 -----------------
0x00007f82c9063d84 __pthread_cond_wait + 0xc4
0x00007f82c84e83c7 Monitor::IWait(Thread*, long) + 0x127
0x00007f82c84ea3ed Monitor::wait(bool, long, bool) + 0x1fd
0x00007f82c877b114 _ZN10JavaThread17java_suspend_selfEv.part.195 + 0xb4
0x00007f82c82cd919 JvmtiRawMonitor::raw_wait(long, bool, Thread*) + 0x229
0x00007f82c82a2a82 JvmtiEnv::RawMonitorWait(JvmtiRawMonitor*, long) + 0xa2
0x00007f82c69d9b49 debugMonitorWait + 0x29
0x00007f82c69cad6e enqueueCommand + 0x11e
0x00007f82c69c6a22 reportEvents.part.3 + 0xa2
0x00007f82c69c7046 event_callback + 0x446
0x00007f82c69c9d55 cbBreakpoint + 0xc5
0x00007f82c82bc78f JvmtiExport::post_raw_breakpoint(JavaThread*, Method*, unsigned char*) + 0x1ef
0x00007f82c80bf662 InterpreterRuntime::_breakpoint(JavaThread*, Method*, unsigned char*) + 0xa2
0x00007f82a89af712 Locked ownable synchronizers:
    - None
. . . . . . . . . . . . . . . . . .

----------------- 9169 -----------------
0x00007f82c9063d84 __pthread_cond_wait + 0xc4
0x00007f82c84e83c7 Monitor::IWait(Thread*, long) + 0x127
0x00007f82c84ea98f Monitor::wait(bool, long, bool) + 0x79f
0x00007f82c861d380 ReferencePendingListLockerThread::receive_and_handle_messages() + 0xb0
0x00007f82c861d8a9 ????????
0x00007f82c878c991 JavaThread::thread_main_inner() + 0x1d1
0x00007f82c878cb3a JavaThread::run() + 0x15a
0x00007f82c854a3b2 java_start(Thread*) + 0x142
Locked ownable synchronizers:
    - None
. . . . . . . . . . . . . . . . . .

----------------- 9202 -----------------
0x00007f82c90640fe __pthread_cond_timedwait + 0x13e
0x00007f82c85228f1 ObjectMonitor::EnterI(Thread*) + 0x581
0x00007f82c8523147 ObjectMonitor::enter(Thread*) + 0x307
0x00007f82c872fb11 ObjectSynchronizer::slow_enter(Handle, BasicLock*, Thread*) + 0xa1
0x00007f82c872fdcb ObjectSynchronizer::fast_enter(Handle, BasicLock*, bool, Thread*) + 0x8b
0x00007f82c861d9ef ReferencePendingListLocker::lock() + 0x13f
0x00007f82c87ffdef VM_GC_Operation::doit_prologue() + 0xbf
0x00007f82c8832221 VM_G1IncCollectionPause::doit_prologue() + 0x11
0x00007f82c882d49e VMThread::execute(VM_Operation*) + 0x2ee
0x00007f82c7f4a5ee G1CollectedHeap::collect(GCCause::Cause) + 0x1ce
0x00007f82c7f56c41 G1CollectedHeap::attempt_allocation_humongous(unsigned long, unsigned int*, unsigned int*) + 0x3e1
0x00007f82c7f56f1d G1CollectedHeap::mem_allocate(unsigned long, bool*) + 0x26d
0x00007f82c7abe0d7 CollectedHeap::array_allocate(KlassHandle, int, int, Thread*) + 0x2d7
0x00007f82c87b0864 TypeArrayKlass::allocate_common(int, bool, Thread*) + 0x1d4
0x00007f82c8179c3d jni_NewByteArray + 0x16d
0x00007f82c69b5d20 newInstance + 0x430
0x00007f82c69c3fd8 debugLoop_run + 0x298
0x00007f82c69d661e attachThread + 0x2e
0x00007f82c82c6093 JvmtiAgentThread::call_start_function() + 0x153
0x00007f82c878c991 JavaThread::thread_main_inner() + 0x1d1
0x00007f82c878cb3a JavaThread::run() + 0x15a
0x00007f82c854a3b2 java_start(Thread*) + 0x142
Locked ownable synchronizers:
    - None 


Comments
I noticed a small usability regression apparently introduced by this change. When the ReferenceHandler thread is idle (i.e. almost always) its stack trace misleadingly suggests it is RUNNABLE "Reference Handler" daemon prio=10 Id=2 RUNNABLE at java.base@14-ea/java.lang.ref.Reference.waitForReferencePendingList(Native Method) at java.base@14-ea/java.lang.ref.Reference.processPendingReferences(Reference.java:241) at java.base@14-ea/java.lang.ref.Reference$ReferenceHandler.run(Reference.java:213) it would be nice if the user-visible java thread state and lock info was the same as if it was blocked inside Object.wait, giving stack traces something like "Reference Handler" daemon prio=10 Id=2 WAITING on VM heap lock which would make "dump all threads" output less confusing.
07-09-2019

I was able to reproduce this a couple of times with ~100 iterations. I'm using the webrev.02 patch that was posted for review: http://mail.openjdk.java.net/pipermail/serviceability-dev/2016-April/019410.html (specifically, http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8153711/webrev.02/). That looks like the latest version. Patch installed on a clone from today's (6/1/2016) jdk9/hs forest. I think the priority ought to remain at P2; I don't think the patch is the cause of the problem, merely a known means for exhibiting it.
08-06-2016

The problem arises because we have one process manipulating another via JVMTI. A suspend request is sent to the controlled process, and it just happens to be handled while the Reference Handler thread is holding the pending list lock. An execution request is then sent to the controlled process, which performs an allocation that triggers a GC. The GC attempts to lock the pending list lock and blocks, because the (suspended) Reference Handler thread is already holding that lock. The Reference Handler thread won't be surrendering the lock until the process is resumed. The process won't be resumed until the execution request completes. And the execution request won't complete until the associated GC can obtain the lock. So we've got a deadlock across the two processes.
07-06-2016

I guess, the priority of this bug can be lowered to P3. It is because the test com/sun/jdi/OomDebugTest.java and the fix of 8153711 that allows to reproduce this issue has not been pushed yet. It has not been pushed because a different issue was discovered.
11-05-2016

Please, find the debugger update and new test com/sun/jdi/OomDebugTest.java in the attachment 8153711.patch.
08-05-2016

I've assigned this issue to the hotspot/gc sub-category for initial evaluation. Feel free to move it to the core-svc/debugger if the issue is on the debugger side.
08-05-2016