JDK-8232588 : G1 concurrent System.gc can return early or late
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 14
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2019-10-17
  • Updated: 2021-06-08
  • Resolved: 2019-11-13
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 14
14 b24Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Description
When a concurrent System.gc request is made, the current intent is that it starts a new cycle if one is not already in progress, and then waits until the now in-progress cycle completes.

To accomplish this, when a concurrent System.gc request is made, the VMOp epilogue waits for the current cycle to complete. That wait is applied both when the VMOp performed the initial mark pause and when the VMOp skipped the GC because there was already a concurrent cycle in progress.

However, things can go wrong if two threads simultaneously request such a GC.

Assume there is not already a cycle in progress or some other reason for a request to fail (such as GCLocker being active).

Assume that each thread (A and B) captures the same total_collections() and old_marking_cycles_started() counter values.

Thread A executes its VMOp first, performing the initial mark pause in the safepoint, and increments the total_collections() and old_marking_cycles_started() counters. Once the safepoint is complete, thread A waits in the VMOp epilogue for the concurrent cycle to complete. All is good here.

Thread B executes its VMOp second. Its VMOp prologue sees that total_collections() has changed, so skips any further processing of the VMOp (including the wait in the epilogue). Returning to try_collect, it sees that old_marking_cycles_started() has changed and returns immediately, without waiting for the completion of the cycle started by thread A. The rationale for returning immediately (per the associated comment) is "A Full GC happened ... No point in starting a new cycle given that the whole heap was collected anyway." But this doesn't account for the possibility that marking was started for a concurrent collection and thread B should wait for it to complete.

Comments
URL: https://hg.openjdk.java.net/jdk/jdk/rev/f080b08daace User: kbarrett Date: 2019-11-13 23:14:48 +0000
13-11-2019

The normal behavior for System.gc with ExplicitGCInvokesConcurrent is to first wait for any in-progress cycle to complete, then start a new concurrent cycle and wait until it completes. But there are race conditions with other threads triggering concurrent cycles that can cause both earlier and later returns from a System.gc than would be expected from that normal behavior. Updated the bug summary to indicate there are problems with both early and late returns.
25-10-2019

It seems there are problems in the other direction too. With some bad luck, return from a System.gc might be delayed longer than necessary. Indeed, with sufficiently terrible luck, it might never return. Consider a System.gc while there is a concurrent cycle in progress. The VMOp doit will detect the in-pregress cycle and record the need for a retry. The VMOp epilogue will wait for the cycle to complete. The thread will return from the VMOp execution, compare the marking started counts before and (racy) after, see they are the same, and proceed toward a retry. If at that point, but before that thread grabs the heap lock and gets a new marking started before count, another thread starts a new concurrent cycle, then the first thread is back in the same starting state, and can go through the whole process again (and again...).
25-10-2019