Bug ID: JDK-8153224 Monitor deflation prolong safepoints

JDK-8153224 : Monitor deflation prolong safepoints

Type: Enhancement
Component: hotspot
Sub-Component: runtime
Affected Version: 8u72,9

Priority: P3
Status: Resolved
Resolution: Fixed
OS: generic
CPU: generic

Submitted: 2016-03-31
Updated: 2024-10-10
Resolved: 2020-06-02

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 15
15 b26Fixed

Related Reports

Relates :	JDK-8264420 - Allow MonitorUsedDeflationThreshold=0 for aggressive deflation of all eligible monitors
Relates :	JDK-7021979 - rapid sustained monitor circulation causes asymptotic increase in # of extant monitors
Relates :	JDK-8267842 - SIGSEGV in get_current_contended_monitor
Relates :	JDK-8246493 - JDI stress/serial/mixed002 needs to use WhiteBox.deflateIdleMonitors support
Relates :	JDK-8252126 - 'GVars.stw_random = os::random()' lost by JDK-8246476
Relates :	JDK-8253183 - Fragile memory barrier selection for some weak memory model platforms
Relates :	JDK-8221616 - gtest/GTestWrapper.java crashed due to SIGSEGV on Linux-X64
Relates :	JDK-8184751 - Provide thread pool for parallel safepoint cleanup
Relates :	JDK-8246477 - add whitebox support for deflating idle monitors
Relates :	JDK-8247280 - more fencing needed in async deflation for non-TSO machines
Relates :	JDK-8246476 - remove AsyncDeflateIdleMonitors option and the safepoint based deflation mechanism
Relates :	JDK-8149442 - MonitorInUseLists should be on by default, deflate idle monitors taking too long
Relates :	JDK-8180175 - ObjectSynchronizer only needs to iterate in-use monitors
Relates :	JDK-8180932 - Parallelize safepoint cleanup
Relates :	JDK-8246359 - clarify confusing comment in ObjectMonitor::EnterI()'s race with async deflation
Relates :	JDK-8305994 - Guarantee eventual async monitor deflation
Relates :	JDK-8246676 - monitor list lock operations need more fencing

Sub Tasks

JDK-8217658 :	baseline_cleanups from Async Monitor Deflation project - Resolved
JDK-8217659 :	monitor_logging updates from Async Monitor Deflation project - Resolved
JDK-8221350 :	more monitor logging updates from Async Monitor Deflation project - Resolved
JDK-8222295 :	more baseline cleanups from Async Monitor Deflation project - Resolved
JDK-8225453 :	is_busy diagnostics and other baseline cleanups from Async Monitor Deflation project - Resolved
JDK-8230184 :	rename, whitespace, indent and comments changes in preparation for lock free Monitor lists - Resolved
JDK-8230876 :	baseline cleanups from Async Monitor Deflation v2.0[789] - Resolved
JDK-8234544 :	ObjectSynchronizer::FastHashCode() cleanups from Async Monitor Deflation project - Resolved
JDK-8235795 :	replace monitor list mux{Acquire,Release}(&gListLock) with spin locks - Resolved
JDK-8235931 :	add OM_CACHE_LINE_SIZE and use smaller size on SPARCv9 and X64 - Resolved
JDK-8236035 :	refactor ObjectMonitor::set_owner() and _owner field setting - Resolved

Description

In applications with non-trivial amounts of lock contention the time to deflate idle monitors can prolong safepoints by a non-trivial amount.

Today monitors are deflated by the VM thread in the beginning of every safepoint. In JDK-7021979 it was observed that 250,000 monitors can easily take up to 8ms to process. The 250,000 monitors were created artificially using a small java program.

At Twitter we have seen programs with more than 600,000 monitors and 6500 threads. The ratio between threads and monitors allocated is less than 10% of the MAXPRIVATE (1024) private monitors per thread, suggesting these program do not suffer from the issue reported in JDK-7021979.

Comments

See https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation for details about the Async Monitor Deflation project.
02-06-2020
URL: https://hg.openjdk.java.net/jdk/jdk/rev/629b14c63b75 User: dcubed Date: 2020-06-02 03:37:53 +0000
02-06-2020
My patch for Async Monitor Deflation is based on Carsten Varming's http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/ which I have ported to work with monitor lists. Monitor lists were optional via the '-XX:+MonitorInUseLists' option in JDK8, the option became default 'true' in JDK9, the option became deprecated in JDK10 via JDK-8180768, and the option became obsolete in JDK12 via JDK-8211384. Carsten's webrev is based on JDK10 so there was a bit of porting work needed to merge his code and/or algorithms with jdk/jdk. Note: when I started on this project jdk/jdk was targeted at JDK12 and is now targeted at JDK13. My repos are still relative to jdk/jdk12 for preliminary review, but the project will be rebased to jdk/jdk after the preliminary review. The key pieces: - New option '-XX:AsyncDeflateIdleMonitors' that is default 'true' so that the new mechanism is used in all testing. - ObjectSynchronizer::deflate_monitor_using_JT() is the new counterpart to ObjectSynchronizer::deflate_monitor() and does the heavy lifting of asynchronously deflating a monitor using a three part prototcol: Part 1) Setting a NULL owner field to DEFLATER_MARKER with cmpxchg() forces any contending thread through the slow path. A racing thread would be trying to set the owner field. Part 2) Making a zero count negative with cmpxchg() forces racing threads to retry. A racing thread would have set the owner field (after we stored DEFLATER_MARKER) and would be trying to increment the count field. Part 3) If the owner field is still equal to DEFLATER_MARKER, then we have won all the races and can deflate the monitor. If we lose any of the races, the monitor cannot be deflated at this time. The deflation of a monitor is mostly field resetting and monitor list management, but restoring the object's header is another racy op that is handled by ObjectMonitor::install_displaced_markword_in_object(). - ObjectMonitor::install_displaced_markword_in_object() is the new piece of code that handles all the racy situations with restoring an object's header asynchronously. The function is called from a couple of places (deflation and object monitor entry) and can also race with installation of a hash for the object. The restoration protocol for the object's header uses the mark bit along with the hash() value staying at zero to indicate that the object's header is being restored. Only one of the three possible racing scenarios can win and the losing scenarios all adapt to the winning scenario's object header value. - Various code paths have been updated to recognize an owner field equal to DEFLATER_MARKER or a negative count field and those code paths will retry their operation. See gory details. - ObjectMonitor deflation is still initiated or signaled as needed at a safepoint. When Asynchronous Monitor Deflation is in use, flags are set so that the work is done by JavaThreads which offloads the safepoint mechanism. - ObjectSynchronizer::omAlloc() is modified to call (as needed) ObjectSynchronizer::deflate_per_thread_idle_monitors_using_JT(). Having the JavaThread cleanup its own per-thread monitor list permits this work to happen without any per-thread list locking or critical sections. Having a JavaThread deflate a potentially long list of in-use monitors could potentially delay the start of a safepoint. This is detected in ObjectSynchronizer::deflate_monitor_list_using_JT() which will save the current state when it is safe to do so and return to its caller to drop locks as needed before honoring the safepoint request. - ObjectSynchronizer::inflate() has to be careful how omAlloc() is called. If the inflation cause is inflate_cause_vm_internal, then it is not safe to deflate monitors on the per-thread lists so we skip that. When monitor deflation is done, inflate() has to do the oop refresh dance that is common to any code that can goto a safepoint while holding a naked oop. And, no you can't use a Handle here either. :-) - Everything else is just monitor list managenment, infrastructure, logging, debugging and the like. :-) Gory details: - Counter part function mapping for those that know the existing code: - ObjectSynchronizer class: - deflate_idle_monitors() has deflate_global_idle_monitors_using_JT() and deflate_per_thread_idle_monitors_using_JT() - deflate_monitor_list() has deflate_monitor_list_using_JT() - deflate_monitor() has deflate_monitor_using_JT() - ObjectMonitor class: - is_busy() has is_busy_async() - clear() has clear_using_JT() - These functions recognize the Asynchronous Monitor Deflation protocol and adapt their operations: ObjectMonitor::enter(), ObjectMonitor::EnterI(), ObjectMonitor::ReenterI(); most callers to enter() had to indirectly adapt to the protocol and retry their operations. Also ObjectSynchronizer::slow_enter(), ObjectSynchronizer::reenter(), ObjectSynchronizer::jni_enter(), ObjectSynchronizer::FastHashCode(), ObjectSynchronizer::current_thread_holds_lock(), ObjectSynchronizer::query_lock_ownership(), ObjectSynchronizer::get_lock_owner(), ObjectSynchronizer::inflate_helper(), ObjectSynchronizer::inflate() had to adapt and retry their operations. - Various assertions had to be modified to pass without their real check when AsyncDeflateIdleMonitors is true; this is due to the change in semantics for the ObjectMonitor owner and count fields. - ObjectMonitor has a new allocation_state field that supports three states: 'Free', 'New', 'Old'. Asynchronous Monitor Deflation is only applied to ObjectMonitors that have reached the 'Old' state. When the Asynchronous Monitor Deflation code sees an ObjectMonitor in the 'New' state, it is changed to the 'Old' state, but is not deflated. This prevents a newly allocated ObjectMonitor from being immediately deflated which could cause an inflation-deflation oscillation. - ObjectMonitor has a new ref_count field that is used to indicate that an ObjectMonitor ptr is in use so the ObjectMonitor should not be deflated; this is needed for operations on non-busy monitors so that ObjectMonitor values don't change while they are being queried. - The ObjectMonitor::owner() accessor detects DEFLATER_MARKER and returns NULL in that case to minimize the places that need to understand the new DEFLATER_MARKER value. - System.gc()/JVM_GC() causes a special monitor list cleanup request which uses the safepoint based monitor list mechanism. So even if AsyncDeflateIdleMonitors is enabled, the safepoint based mechanism is still used by this special case. This is necessary for those tests that do something to cause an object's monitor to be inflated, clear the only reference to the object and then expect that enough System.gc() calls will eventually cause the object to be GC'ed even when the thread never inflates another object's monitor. Yes, we have tests like that. :-) The ObjectMonitor ref_count code is ugly. I can say that since I wrote it. :-) I need a better way of abstracting that mechanism, but I think the special allocation style for ObjectMonitors makes that difficult to implement. More to do here.
24-01-2019
I've been working on a forward port of Carsten's webrev to the jdk/jdk12 repo. At this point, I have a functional forward port that passes this Mach5 config: builds-tier1,hs-tier1,jdk-tier1,hs-tier2,hs-tier3 I have split the current project into three distinct changesets: baseline_cleanups - Captures updates to the baseline code that are not directly related to Async Monitor Deflation. monitor_logging - Captures updates and additions to the baseline monitor logging code. monitor_deflate_conc.v1.01 - Captures my port of Carsten's code and/or algorithms for Async Monitor Deflation to the current monitor list baseline.
23-01-2019
In theory the most number of inflated monitors needed at any one time is T-1 - where T is the number of live threads. As a thread can only be contending for one monitor at a time. The pathological case would be where one thread locks T-1 distinct objects and the other T-1 threads each try to lock one of them, causing T-1 inflations. The problem is that while inflation is synchronous and obvious, deflation is non-obvious and so only occurs asynchronously at a safepoint. Hence the number of inflated monitors is limited only by the number of objects that can become contended between safepoints. While clever techniques for processing the excessive number of inflated monitors may help (or even a simple technique like: deflate at most N per safepoint?) I wonder whether we can devise a more sophisticated way of doing synchronous deflation?
18-05-2017
We hit this situation too on SPECjbb2015. One of the ways out is to implement non-aggressive monitor deflation, as mused in the comment: class ObjectSynchronizer : AllStatic { ... // GC: we current use aggressive monitor deflation policy // Basically we deflate all monitors that are not busy. // An adaptive profile-based deflation policy could be used if needed static void deflate_idle_monitors();
15-05-2017
I believe -XX:+MonitorInUseLists might benefit your case. It's been available in HotSpot for some time. It's been unfairly unnoticed but is now being considered to be enabled by default, see JDK-8149442 I'd also note that the value of how many monitors to provision is maintained per thread, thus it's entirely possible that some threads are experiencing a rapid inflation as the one described in JDK-7021979 while others rarely contend on anything. When instrumenting various benchmarks we often saw groups of threads capped at a provisioning of 1024 while most had barely provisioned any, but it's very dependent on application and workload.
31-03-2016