Bug ID: JDK-8210832 Remove sneaky locking in class Monitor

JDK 13
13 b07Fixed

A backport is likely to be rejected by the maintainers because 11u is in maintenance mode and this issue is an enhancement rather than a bugfix.
29-11-2022
[~rsunderbabu] since JDK-8235826 is fixed, would you still like this to be backported?
29-11-2022
URL: http://hg.openjdk.java.net/jdk/jdk/rev/043ae846819f User: coleenp Date: 2019-02-05 20:17:15 +0000
05-02-2019
Old teamwork comment in safepoint.cpp: // XXX NOTE: Because the CMS thread does not resort to sneaky locking // and because (we believe) the following code sometimes needs to // obtain certain global locks needed for deopt that are held by // Java threads that have blocked at safepoints, there is a (very) small // probability that the CMS thread might deadlock in this loop. // We will revisit this code in such an event. (The nice thing about // debugging such an event is that we have a stable deadlock to look // at, not transient errors that happen long after their cause.) // In that case, one fix would be to let the CMS thread also resort // to sneaky locking (which we could do by maintaining not a boolean // attribute for _safepointing_thread above, but a thread identity // attribute, used in an analogous fashion). We'll revisit this // issue post-Tiger by evaluating the alternative solutions to // the problem. comment added with this change: Ad D 1.269.1.3 04/04/20 10:26:53 ysr 641 640 ^Ac (5030646) allow the cms thread to also do deopt's. See documentation ^Ac note at lines 255-268 on the potential problems there and how they might be ^Ac fixed if they do turn out to be real problems. The current choice was based ^Ac on minimizing the measure of a failure event (the choice being between having ^Ac the cm thread never do deopt or always do deopt and sometimes deadlock). Comment deleted with this change: ^Ad D 1.273 04/06/15 12:59:28 coleenp 644 643 ^Ac 4974572 Remove pre-polling compiled code safepoint patching. Old teamwork comments in mutex.cpp which limited sneaky locking to VMThread (not also CMS thread) ^Ad D 1.49 04/04/16 10:56:45 ysr 78 77 ^Ac (5030646) the vm thread (and only the vm thread) can sneak a lock ^Ac from under a java thread only at a safepoint that was initiated by it; ^Ac see code related to _suppress_signal. https://bugs.openjdk.java.net/browse/JDK-5030646 - this was the fix that limited this to the VMThread note that at the time, both theVMThread and the CMS thread could be the _safepointing_thread.
10-01-2019
Further instrumentation shows that only the following locks are regularly acquired by the VMThread at a safepoint: - VtableStubs_lock - InlineCacheBuffer_lock both of these are known to be sneak-proof by virtue of the CompiledIC_lock being acquired first by JavaThreads. - ExpandHeap_lock Somehow this escaped my other instrumentation so I can't tell what threads acquired it - but presumably only GC and VMThread, so not an issue for sneaking - OopMapCacheAlloc_lock Interestingly only locked by the VMThread. This is a use-once lock that guards initialization of the oopmap_cache - so in principle it might be acquired by any thread but in practice it may be more constrained. I only observed its use via GenCollectForAllocation.
19-10-2018
So to recap the necessary conditions for lock sneaking to actually occur: 1. A lock has to be contended by JavaThreads so that at least one JT takes the slow path. 2. The lock has to then be released so that the slow path thread can acquire it and start to proceed through the TBIVM destructor 3. At the same time a safepoint has to be initiated that causes the slow path thread to hit the safepoint check and block 4. The VM operation for the current safepoint has to need the lock that the slow path thread is acquiring. This shows how rare it would be for sneaking to occur, and very difficult to force. And as [~pchilanomate] has observed and reasoned, you can't get contention on a lock that is only acquired in a nested locking context - as that would require more than one thread to hold the outer lock. So this may well reduce the theoretically sneakable lock set further.
18-10-2018
My dynamic instrumentation did not pick up any VMThread uses of the following locks that you flagged as sneaking potentials: ClassLoaderDataGraph_lock Compile_lock InlineCacheBuffer_lock MonitorSupply mutex OffsetTableContigSpace par alloc lock VtableStubs_lock Have you recorded the VM_operations that lead to the use of each of these? My instrumentation didn't pick up the MonitorSupply mutex at all. It relies on all mutex/monitor being declared in mutexLocker.hpp in the _mutex_array and the MonitorSupply mutex does not do that.
17-10-2018
Doh! Of course - we have to be on the slow path to start with. Sorry.
17-10-2018
Because even if you call lock() with a JavaThread, the call will only check for safepoint if TryFast() fails(and part of TrySpin) and the thread ends up in the tbivm jacket wrapping ILock(Self). For that to fail somebody has to have that lock when that JavaThread tries to get it. If all the JavaThreads need to grab first CompiledIC_lock, then TryFast() for InlineCacheBuffer_lock or VtableStubs_lock cannot fail because of other JavaThread, because that would mean two JavaThreads have the CompiledIC_lock. It could fail if the VMThread or some NonJavaThread has acquired InlineCacheBuffer_lock or VtableStubs_lock, but I didn't see any instances of that outside a safepoint.
17-10-2018
I'm not sure why you say you need multiple contending JavaThreads for sneaking to occur? It only requires one JavaThread to be trying to acquire a lock at the same time that a safepoint is taken and the VM_operation needs the same lock.
17-10-2018
For the case of InlineCacheBuffer_lock and VtableStubs_lock, a JavaThread seems to always grab CompiledIC_lock first before trying to grab any of those locks (This might explain the comment in CompiledIC_lock definition in MutexLocker.cpp ?). This prevents multiple JavaThreads from contending for those locks at the same time and thus prevents sneaking, since the slower path that includes the tbivm jacket will not be taken. The CompiledIC_lock doesn't appear to be taken by the VMThread, instead the VMThread directly acquires InlineCacheBuffer_lock and VtableStubs_lock. In other words, although at first glance InlineCacheBuffer_lock and VtableStubs_lock seem to require sneaking by the VMThread, because they are preceded by an acquisition of the CompiledIC_lock, sneaking doesn't seem possible.
16-10-2018
I did the same experiment with stack traces for each, with similar results. I also added whether the lock was taking in a safepoint, eg: VtableStubs_lock_JavaThread VtableStubs_lock_NonJavaThread_S VtableStubs_lock_VMThread_S I think the need for sneaking is now observed but also the problem with other threads incorrectly getting snuck possibly causing races. With the GC threads and concurrent class unloading, I think we want a general mechanism where JavaThread, NonJavaThread and VMThread participate via a lock. Where the JavaThread additionally checks for safepoints while it is blocked, but not after acquiring the outer lock (ILock) as we have today. So I think Patricio's approach is very promising.
15-10-2018
The following problem lock usages were detected: CGC_lock owners: JVO DirtyCardQ_CBL_mon owners: JVO DirtyCardQ_FL_lock owners: JVO VtableStubs_lock owners: JVO InlineCacheBuffer_lock owners: JVO NonJavaThreadsList_lock owners: JVO SATB_Q_CBL_mon owners: JVO VMOperationQueue_lock owners: JVO VMOperationRequest_lock owners: JVO
09-10-2018
I've also added dynamic instrumentation to see which locks actually get locked by which type of threads. I have yet to collate the results from a broad set of runs but there's a common pattern which shows less actual sharing than theoretically possible. Here's a sample below. Only one lock shows actual usage by all three thread types (Java, VMThread, Other ie non-Java) and this is the JNIHandleBlockFreeList_lock - but that's only because every thread acquires this once a startup. Once the VMThread has accessed it during VM initialization it doesn't touch it again, so there are no sneaking issues. The VMThread normally acquires very few locks - the 4 related to VM operations and safepoints, plus the single-use Notify_lock during VM initialization. None of which should need sneaking. Summary of Mutex/Monitor owner types: J(avaThread), V(MThread), O(ther thread): tty_lock owners: J CGC_lock owners: JO STS_lock owners: VO VMWeakAlloc_lock owners: J VMWeakActive_lock owners: StringTableWeakAlloc_lock owners: J StringTableWeakActive_lock owners: FullGCCount_lock owners: SATB_Q_FL_lock owners: SATB_Q_CBL_mon owners: Shared_SATB_Q_lock owners: DirtyCardQ_FL_lock owners: DirtyCardQ_CBL_mon owners: JO Shared_DirtyCardQ_lock owners: FreeList_lock owners: OldSets_lock owners: RootRegionScan_lock owners: O StringDedupQueue_lock owners: StringDedupTable_lock owners: MarkStackFreeList_lock owners: MarkStackChunkList_lock owners: MonitoringSupport_lock owners: J ParGCRareEvent_lock owners: DerivedPointerTableGC_lock owners: CGCPhaseManager_lock owners: O CodeCache_lock owners: J RawMonitor_lock owners: OopMapCacheAlloc_lock owners: MetaspaceExpand_lock owners: J ClassLoaderDataGraph_lock owners: J Patching_lock owners: J Service_lock owners: J JmethodIdCreation_lock owners: J SystemDictionary_lock owners: J SharedDictionary_lock owners: Module_lock owners: J InlineCacheBuffer_lock owners: JO VMStatistic_lock owners: ExpandHeap_lock owners: JNIHandleBlockFreeList_lock owners: JVO SignatureHandlerLibrary_lock owners: J SymbolArena_lock owners: J StringTable_lock owners: ProfilePrint_lock owners: ExceptionCache_lock owners: OsrList_lock owners: J Debug1_lock owners: BeforeExit_lock owners: J PerfDataMemAlloc_lock owners: J PerfDataManager_lock owners: J Safepoint_lock owners: JV Threads_lock owners: JV NonJavaThreadsList_lock owners: J VMOperationQueue_lock owners: JV VMOperationRequest_lock owners: JV RetData_lock owners: Terminator_lock owners: JO VtableStubs_lock owners: JO Notify_lock owners: JV JNIGlobalAlloc_lock owners: J JNIGlobalActive_lock owners: JNIWeakAlloc_lock owners: JNIWeakActive_lock owners: JNICritical_lock owners: AdapterHandlerLibrary_lock owners: J Heap_lock owners: J JfieldIdCreation_lock owners: J ResolvedMethodTable_lock owners: CompiledIC_lock owners: J CompileTaskAlloc_lock owners: J CompileStatistics_lock owners: J DirectivesStack_lock owners: J MultiArray_lock owners: J JvmtiThreadState_lock owners: J Management_lock owners: Compile_lock owners: J MethodData_lock owners: J TouchedMethodLog_lock owners: MethodCompileQueue_lock owners: J Debug2_lock owners: Debug3_lock owners: CompileThread_lock owners: J PeriodicTask_lock owners: JO RedefineClasses_lock owners: ThreadHeapSampler_lock owners: JfrMsg_lock owners: JfrBuffer_lock owners: JfrStream_lock owners: JfrStacktrace_lock owners: CodeHeapStateAnalytics_lock owners:
09-10-2018
On a positive note, zero cases of leak sneaking were observed.
09-10-2018
Removing sneaky locking by moving the tbivm jacket around ParkCommon() and adding another ParkEvent for the recursive call, does not work straightforwardly if the recursive call is for the same lock, which in this case it is (Threads_lock). The first ParkEvent could be removed from the cxq queue by the thread releasing the lock before the recursive ParkEvent is enqueued. The thread will be blocked indefinitely since it will be waiting for itself to make progress in the first call to lock(). Without making the code more complex, this solution will need for the safepoint protocol to avoid taking native monitors inside the tbivm jacket.
08-10-2018
By adding additional logging in mutex.cpp the following monitors/mutexes seem to need sneaky locking, i.e. they are locked by the VMThread during a safepoint and are also acquired by JavaThreads with no_safepoint_check = false: ClassLoaderDataGraph_lock Compile_lock InlineCacheBuffer_lock MonitorSupply mutex OffsetTableContigSpace par alloc lock VtableStubs_lock From those, the following subset also seem to be acquired by nonJavaThreads and hence indicate conflicting cases since sneaky locking should not be used to avoid races: ClassLoaderDataGraph_lock InlineCacheBuffer_lock OffsetTableContigSpace par alloc lock VtableStubs_lock Also JfrStacktrace_lock and OopMapCacheAlloc_lock are acquired by VMThread in a safepoint and by NonJavaThreads, but didn't found instances of JavaThreads trying to lock it so it might be okay to prevent sneaking in those cases.
08-10-2018
Mechanism is correct but cause-and-effect are wrong. It isn't "unfortunate" that a safepoint check can happen between the inner and outer lock, it is essential to avoid deadlock. When a lock is shared between the VMThread and JavaThreads, a JavaThread must not go to a safepoint whilst holding that lock. But a JavaThread that blocked trying to acquire a lock which is now released must not advance when a safepoint is in progress. So there must be a safepoint check in the lock acquisition exit path, but it can not be after the outer-lock is set else you have the deadlock. Hence the safepoint check comes when the inner lock is held but not the outer. That alone would still lead to deadlock with the VMThread, hence "sneaking" to the rescue as the VMThread can grab the lock via just the outer-lock when at a safepoint. But sneaking is broken if non-JavaThreads are also involved. And there seem to be no guards against that.
03-10-2018
Perhaps this for a short what-is-lock-sneaking: VM internal locks (Monitor and Mutex) are implemented with an outer lock and an inner lock. Unfortunately a safepoint can happen in between the outer and inner lock acquisitions. If the lock is one that the VMThread needs during a safepoint (a bad design in any case), we have to allow the VMThread to sneak the lock to avoid a deadlock. The JavaThread is allowed to “own” the outer lock during the safepoint, but the VMThread is allowed to sneak the inner lock, do its work, and release the inner lock with the JavaThread being none the wiser. Here's an example block of code that allows lock sneaking to work: src/hotspot/share/runtime/mutex.cpp void Monitor::lock(Thread * Self) { : if (Self->is_Java_thread()) { // Horrible dictu - we suffer through a state transition assert(rank() > Mutex::special, "Potential deadlock with special or lesser rank mutex"); ThreadBlockInVM tbivm((JavaThread *) Self); ILock(Self); // JavaThread is HERE; VMThread starts safepoint here } else { // Mirabile dictu ILock(Self); } goto Exeunt; You have a JavaThread that has called Monitor::lock() on Some_lock and it has returned from ILock() which means that the JavaThread owns the "outer" lock (the lock bits in Some_lock). Before the ThreadBlockInVM destructor has run, the VMThread starts a safepoint which will cause the JavaThread to call SafepointSynchronize::block(). Eventually, all JavaThreads will reach a safepoint safe state and the VMThread will change the safepoint state from _synchronizing -> _synchronized. Once SafepointSynchronize::is_at_safepoint() is true, this line allows sneaking: bool can_sneak = Self->is_VM_thread() && SafepointSynchronize::is_at_safepoint(); so even though the JavaThread "owns" Some_lock (just the outer lock), the VMThread will be allowed to sneak in, grab Some_lock, do its work, release Some_lock before finishing the safepoint. After the safepoint is finished, JavaThread will be allowed to finish locking Some_lock (the inner lock) and proceed with what it needed to do. Note: Any block where ILock() has been called and there is a possible safepoint before the owner field is set is subject to lock sneaking. (Thanks Patricio!)
03-10-2018
I thought there was a race after the ThreadBlockInVM destructor ran and before the JavaThread called 'goto Exeunt' to get to this block: Exeunt: assert(ILocked(), "invariant"); assert(owner() == NULL, "invariant"); set_owner(Self); return; If the VMThread tried to sneak Some_lock at the same time that JavaThread did the 'goto Exeunt', then we would have an ownership race. However, this race isn't possible because the VMThread can't sneak Some_lock until the system is at a safepoint. The system can't be at a safepoint if the JavaThread made it past the ThreadBlockInVM destructor because that JavaThread cannot check-in for the safepoint. So the safepoint state will stay at _synchronizing until the JavaThread is done with Some_lock and what ever work it needed to do there.
03-10-2018
Reason to try remove sneaky locking: There is a race in the current Mutex/Monitor code if a non JavaThread and the VMThread contend for the same lock. Scenario: At a safepoint a nonJavaThread calls lock() for lockA, succeeds in setting the lock-bit but before setting the owner VMThread calls lock() for lockA too. Since it's in a safepoint and the owner is null sneaks in. Both VMThread and nonJavaThread will believe they have the lock. This happens with trylock() too.
03-10-2018
I suspect you will find that you can not avoid safepoints while holding such locks. The _no_safepoint_check flag only affects whether the locking (and unlocking?) checks for a pending safepoint. The code executed with the lock held would have to be manually inspected in full to see if a safepoint is possible. NoSafepointVerifier can be added for a runtime check, but you can't guarantee you've exercised all code paths.
17-09-2018

Duplicate :	JDK-8217755 - SuspendThread might hang expecting lock in JavaThread::is_ext_suspend_completed
Duplicate :	JDK-8170643 - Remove mixed-size accesses to the LockWord
Relates :	JDK-5030646 - CMS: atg crashed with fastdebug build on rhas_3.0 SP1
Relates :	JDK-8213137 - Remove static initialization of monitor/mutex instances
Relates :	JDK-8213105 - Reorder methods in mutex.cpp so related ones are near each other
Relates :	JDK-8230292 - Handshakes can live-lock causing a stack overflow.
Relates :	JDK-8074355 - make MutexLocker smarter about non-JavaThreads
Relates :	JDK-8219642 - ciReplay loads wrong data when MethodData size changes
Relates :	JDK-8256383 - PlatformMutex::try_lock has different semantics on Windows and Posix
Relates :	JDK-8218975 - Bug in macOSX kernel's pthread support
Relates :	JDK-8219448 - split-if update_uses accesses stale idom data
Relates :	JDK-8214174 - runtime/handshake/HandshakeWalkSuspendExitTest.java failed with timeout