Bug ID: JDK-8013129 Possible deadlock with Metaspace locks due to mixed usage of safepoint aware and non-safepoint aware locking

Type: Bug
Component: hotspot
Sub-Component: runtime
Affected Version: hs25

Priority: P2
Status: Closed
Resolution: Fixed

Submitted: 2013-04-24
Updated: 2014-05-14
Resolved: 2013-04-29

JDK 8	Other
8Fixed	hs25Fixed

Allocation of metadata is protected by the per-cld Metaspace lock.
The allocation paths always take the Metaspace locks with _no_safepoint_check_flag.

De-allocation of metadata is similarly protected by the per-cld Metaspace lock.
However de-allocations use a normal MutexLocker and thereby a different code path in Mutex.

I believe this was safe until recently because we only deallocated metadata from the VM thread while safepointed but with the MethodCounters change we can now end up deallocating metadata while the VM is running.

In the attached stacktrace Thread 49 is attempting to acquire a lock with _no_safepoint_check_flag, Thread 47 has acquired the lock and is blocking in ~ThreadBlockInVM and Thread 27 is initiating a safepoint.
14-05-2014
Since this issue came up again I had to dig up the cause for the deadlock again. Here's the situation I'm seeing: T1 is initiating a safepoint (for whatever reason) T2 is attempting to take the lock with _no_safepoint_check_flag and has been parked in Monitor::ILock (but hasn't transitioned from _thread_in_vm) T3 has returned from ILock in Monitor::lock and acknowledges the safepoint request in ~ThreadBlockInVM T2 will never reach the safepoint since it's blocking while _vm_running and T3 has blocked for the safepoint.
14-05-2014
It's hard to write a reliable regression test for this fix. I've attempted to do so but was unsuccessful. The failure was seen as a "Timeout" in some of the tests in vm.gc.testlist so noreg-sqe may be appropriate but I'm not sure.
21-05-2013
I've seen this as timeouts in some threaded GC tests mainly on windows 64 bit. I've reproduced this on Linux 64 bit with +SafepointTimeout too with some code to provoke more allocation/deallocation of MethodCounters but it took around an hour to get it to reproduce. The hanging thread has the following stacktrace: #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007ff486ac9d64 in os::PlatformEvent::park (this=this@entry=0x7ff4801b5a00) at /localhome/hg/hsx-gc/src/os/linux/vm/os_linux.cpp:5076 #2 0x00007ff486a693e1 in ParkCommon (ev=<optimized out>, timo=<optimized out>) at /localhome/hg/hsx-gc/src/share/vm/runtime/mutex.cpp:421 #3 Monitor::ILock (this=0x7ff48003e508, Self=0x7ff4801b3800) at /localhome/hg/hsx-gc/src/share/vm/runtime/mutex.cpp:488 #4 0x00007ff486a6bf04 in lock_without_safepoint_check (Self=0x7ff4801b3800, this=0x7ff48003e508) at /localhome/hg/hsx-gc/src/share/vm/runtime/mutex.cpp:956 #5 Monitor::lock_without_safepoint_check (this=0x7ff48003e508) at /localhome/hg/hsx-gc/src/share/vm/runtime/mutex.cpp:962 #6 0x00007ff486a2a8cf in MutexLockerEx (mutex=0x7ff48003e508, this=<synthetic pointer>, no_safepoint_check=<optimized out>) at /localhome/hg/hsx-gc/src/share/vm/runtime/mutexLocker.hpp:218 #7 SpaceManager::allocate (this=0x7ff4800618f8, word_size=word_size@entry=4) at /localhome/hg/hsx-gc/src/share/vm/memory/metaspace.cpp:2314 #8 0x00007ff486a2ab9e in allocate (mdtype=Metaspace::NonClassType, word_size=4, this=<optimized out>) at /localhome/hg/hsx-gc/src/share/vm/memory/metaspace.cpp:2888 #9 Metaspace::allocate (loader_data=loader_data@entry=0x7ff48003e478, word_size=word_size@entry=4, read_only=read_only@entry=false, mdtype=mdtype@entry=Metaspace::NonClassType, __the_thread__=__the_thread__@entry=0x7ff4801b3800) at /localhome/hg/hsx-gc/src/share/vm/memory/metaspace.cpp:3006 #10 0x00007ff486280738 in MetaspaceObj::operator new (size=size@entry=32, loader_data=loader_data@entry=0x7ff48003e478, word_size=word_size@entry=4, read_only=read_only@entry=false, __the_thread__=__the_thread__@entry=0x7ff4801b3800) at /localhome/hg/hsx-gc/src/share/vm/memory/allocation.cpp:61 #11 0x00007ff486a40ed0 in MethodCounters::allocate (loader_data=loader_data@entry=0x7ff48003e478, __the_thread__=__the_thread__@entry=0x7ff4801b3800) at /localhome/hg/hsx-gc/src/share/vm/oops/methodCounters.cpp:29 #12 0x00007ff486a35d1b in Method::build_method_counters (m=m@entry=0x7ff4843717f0, __the_thread__=__the_thread__@entry=0x7ff4801b3800) at /localhome/hg/hsx-gc/src/share/vm/oops/method.cpp:388 #13 0x00007ff486797aa2 in InterpreterRuntime::build_method_counters (thread=0x7ff4801b3800, m=0x7ff4843717f0) at /localhome/hg/hsx-gc/src/share/vm/interpreter/interpreterRuntime.cpp:907 I believe that the problem occurs when a thread is releasing the lock through the normal path and another thread is waiting on the lock with _no_safepoint_check_flag but it's hard to follow the Mutex code.
24-04-2013

Relates :	JDK-8039458 - Ensure consistency of Monitor/Mutex lock acquisitions in relation to safepoint protocol
Relates :	JDK-8010862 - The Method counter fields used for profiling can be allocated lazily