Bug ID: JDK-8057573 CMSScavengeBeforeRemark ignored if GCLocker is active

Type: Bug
Component: hotspot
Sub-Component: gc

Priority: P4
Status: Closed
Resolution: Not an Issue

Submitted: 2014-09-04
Updated: 2019-12-14
Resolved: 2019-12-14

Other
tbdResolved

We use quite large edens and we run with -XX:+CMSScavengeBeforeRemark to empty the eden before each remark to keep remark times reasonable. It turns out that when the remark pause is scheduled it doesn't try to synchronize with the GCLocker at all. The result is that, quite often, the scavenge before remark aborts because the GCLocker is active. This leads to substantially longer remarks.

A side-effect of this is that the remark pause with the aborted scavenge is immediately followed by a GCLocker-initiated GC (with the eden being half empty). The aborted scavenge checks whether the GCLocker is active with check_active_before_gc() which tells the GCLocker to do a young GC if it's active. And the young GC is done without waiting for the eden to fill up.

CMS has been removed - JDK-8229049
14-12-2019
ILW I: medium -> some performance loss likelihood: medium -> needs lots of gclocker activity, CMS workaround: medium -> use smaller eden, avoid gclocker activity -> MMM -> P4
05-09-2014
I brought this up on the hotspot-gc-dev list. I'd like to post Ramki's reply to have it recorded here (yes, I did ask for his permission!): The scavenge-before-remark bailing if the gc-locker is active was an expedient solution and one that I did not expend much thought to as gc-lockers were considered infrequent enough not to affect the bottom-line by much. I can imagine though that with very frequent gc-locker activity and extremely large Edens, that this can be an issue. The fact that the scavenge might bail was already considered as some of the comments in that section of code indicate. A ticklish dilemma here is whether the CMS thread should wait for the JNI CS to clear or just plough on as is the case today. The thinking there was that it's better to have a longish remark pause because of not emptying Eden than to delay the CMS collection and risk a concurrent mode failure which would be much more expensive. As you alluded to in your email, the issue is a bit tricky because of the way scavenge before remark is currently implemented ... CMS decides to do a remark, stops all the mutators, then decides that it must do a scavenge, which now cannot be done because the gc-locker is held, so we bail from the scavenge and just do the remark pause (this is safe because no objects are moved). The whole set-up of CMS' vm-ops was predicated on the assumption of non-interference with other operations because these are in some sense "read-only" wrt the heap, so we can safely schedule the safepoint at any time without any worries about moving objects. Scavenge-before-remark is the only wrinkle in this otherwise flat and smooth landscape. I suspect the correct way to deal with this one and for all in a uniform manner might be to have vm ops that need a vacant gc-locker to be enqueued on a separate vm-ops queue whose operations are executed as soon as the gc-locker has been vacated (this would likely be all the vm-ops other than perhaps a handful of CMS vm-ops today). But this would be a fairly intrusive and delicate rewrite of the vm-op and gc-locker subsystems. A quicker point-solution might be to split the scavenge-and-remark vm-op into two separate vm ops -- one that does a (guaranteed) scavenge, followed by another that does a remark -- each in a separate safepoint, i.e. two separate vm-ops. One way to do this might be for the CMS thread to take the jni critical lock, set needs_gc() if the gc locker is active, and then wait on the jni critical lock for it to be cleared (which it will be by the last thread exiting a JNI CS) which would initiate the scavenge. If the gc locker isn't active, the scavenge can be initiated straightaway by the CMS thread in the same way that a JNI thread would have initiated it when it was the last one exiting a JNI CS.. Once the scavenge has happened, the CMS thread can then do the remark in the normal way. Some allocation would have happened in Eden between the scavenge and the remark to follow, but hopefully that would be sufficiently small as not to affect the performance of the remark. The delicate part here is the synchronization between gc locker state, the cms thread initiating the vm-op for scavenge/remark and the jni threads, but this protocol would be identical to the existing one, except that the CMS thread would now be a participant in that proctocol, which it never was before (this might call for some scaffolding in the CMS thread so it can participate). All this having been said, I am slightly surprised that remark pauses for large Edens are so poor. I would normally expect that pointers from young to old would be quite few and with the Eden being scanned multi-threaded (at sampled "top" boundaries -- perhaps this should use TLAB boundaries instead), we would be able to scale OK to larger Edens. Have you looked at the distribution of Eden scanning times during the remark pause? Does Eden scanning dominate the remark cost? (I was also wondering if it might be possible to avoid using whatever was causing such frequent gc-locker activity as a temporary workaround until the issue w/CMS is fixed?)
04-09-2014
I can reproduce it with the attached test (JNICriticalStressTest3.java, which is similar to the test in JDK-8048556 but with an additional thread doing frequent System.gc()'s). I run it with: java -XX:+UseConcMarkSweepGC -XX:+CMSScavengeBeforeRemark -XX:+ExplicitGCInvokesConcurrent -Xms5g -Xmx5g -Xmn4g -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC JNICriticalStressTest3 300 4 1 which causes frequent CMS cycles. I also attached the output from this test. Here's a "regular" remark for which the scavenge succeeded: 18.292: #38: [Rescan (parallel) , 0.0021019 secs]18.294: #38: [weak refs processing, 0.0000165 secs]18.294: #38: [class unloading, 0.0003749 secs]18.295: #38: [scrub symbol table, 0.0005812 secs]18.295: #38: [scrub string table, 0.0001804 secs][1 CMS-remark: 10803K(1048576K)] 34050K(4823488K), 0.0127242 secs] [Times: user=0.03 sys=0.00 real=0.01 secs] (the regular remarks are around 10ms-15ms). Here's one for which scavenge failed due to the GCLocker: 119.563: #267: [GC (CMS Final Remark) [YG occupancy: 2302304 K (3774912 K)]119.563: #267: [Rescan (parallel) , 0.6727359 secs]120 .236: #267: [weak refs processing, 0.0000151 secs]120.236: #267: [class unloading, 0.0003364 secs]120.236: #267: [scrub symbol ta ble, 0.0004287 secs]120.237: #267: [scrub string table, 0.0001825 secs][1 CMS-remark: 10803K(1048576K)] 2313107K(4823488K), 0.673 8128 secs] [Times: user=2.43 sys=0.00 real=0.67 secs] 120.237: #267: [CMS-concurrent-sweep-start] 120.244: #267: [CMS-concurrent-sweep: 0.007/0.007 secs] [Times: user=0.02 sys=0.00 real=0.01 secs] 120.244: #267: [CMS-concurrent-reset-start] SYSTEM_GC AFTER 120.246: #267: [CMS-concurrent-reset: 0.002/0.002 secs] [Times: user=0.01 sys=0.00 real=0.00 secs] {Heap before GC invocations=250 (full 19): par new generation total 3774912K, used 2351202K [0x0000000680000000, 0x0000000780000000, 0x0000000780000000) eden space 3355520K, 69% used [0x0000000680000000, 0x000000070eb35ed0, 0x000000074cce0000) from space 419392K, 3% used [0x000000074cce0000, 0x000000074d9c2ab0, 0x0000000766670000) to space 419392K, 0% used [0x0000000766670000, 0x0000000766670000, 0x0000000780000000) concurrent mark-sweep generation total 1048576K, used 10803K [0x0000000780000000, 0x00000007c0000000, 0x00000007c0000000) Metaspace used 2785K, capacity 4496K, committed 4864K, reserved 1056768K class space used 293K, capacity 388K, committed 512K, reserved 1048576K 120.247: #269: [GC (GCLocker Initiated GC) 120.247: #269: [ParNew: 2351202K->24313K(3774912K), 0.0088857 secs] 2362005K->35117K(4 823488K), 0.0089206 secs] [Times: user=0.02 sys=0.00 real=0.01 secs] Heap after GC invocations=251 (full 19): par new generation total 3774912K, used 24313K [0x0000000680000000, 0x0000000780000000, 0x0000000780000000) eden space 3355520K, 0% used [0x0000000680000000, 0x0000000680000000, 0x000000074cce0000) from space 419392K, 5% used [0x0000000766670000, 0x0000000767e2e750, 0x0000000780000000) to space 419392K, 0% used [0x000000074cce0000, 0x000000074cce0000, 0x0000000766670000) concurrent mark-sweep generation total 1048576K, used 10803K [0x0000000780000000, 0x00000007c0000000, 0x00000007c0000000) Metaspace used 2785K, capacity 4496K, committed 4864K, reserved 1056768K class space used 293K, capacity 388K, committed 512K, reserved 1048576K } Note that such remarks are not only longer but are followed by a GCLocker-initiated GC with a non-full eden. This looks similar to the unnecessary young GCs reported in JDK-8048556 (i.e., a GCLocker-initiated young GC happens with a non-full eden), but the root cause is actually different.
04-09-2014

Relates :	JDK-8048556 - Unnecessary GCLocker-initiated young GCs
Relates :	JDK-8057586 - Explicit GC ignored if GCLocker is active