Bug ID: JDK-8040245 G1: VM hangs during shutdown

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 8	JDK 9
8u20Fixed	9 b12Fixed

Due to the change http://hg.openjdk.java.net/jdk9/hs-gc/hotspot/rev/ebe7363ae01b there is now a race issue when shutting down the JVM, causing it to livelock.

The issue seems to be that some GC threads are waiting in the first sync barrier whereas other GC threads have already exited, meaning that the ones that wait in the barrier will wait forever.

After further investigation: I've found at least one additional bug which can cause the concurrent marking threads to hang (use of os::elapsedVTime() in some environments). Since the hanging issue is causing lots of failures in testing I'd suggest that we temporarily disable the controlled shutdown of the concurrent GC threads to avoid this problem, and enabled it again when the other bugs have been fixed.
17-04-2014
I = Concurrent mark hangs forever so mixed GCs never happens and VM hangs at shutdown -> H L = Reproducible, but requires an app with a special allocation patterns running on a single core machine. Never seen in the wild -> M W = Don't use G1 -> H ILW = HMH = P1
17-04-2014
After some further investigation: The underlying problem is unrelated to the changeset that is referred to in the description of the bug. I can reproduce the problem without that change, it's just a lot harder to spot. The end result is that concurrent mark hangs forever, but the VM continues to run thinking that concurrent mark just hasn't finished yet. Since concurrent mark hangs we will never again see any mixed GCs. Young GC and Full GC will however continue to work because the hanging threads have left the STS. The fix for JDK-8037112 exposes this problem more clearly, because there we actually depend on that the mark threads aren't stuck somewhere. It seems that this bug has been in G1 since day one, but gone unnoticed until now.
16-04-2014
Seems to be caused by a race condition in the termination protocol in concurrent mark's work gang. Problem happens when the mark stack overflows, but this condition is not seen by all worker threads so some of them fail to participate in the WorkGangBarriers used in the overflow path.
15-04-2014

Relates :	JDK-8037112 - gc/g1/TestHumongousAllocInitialMark.java caused SIGSEGV
Relates :	JDK-8040807 - G1: Enable G1CollectedHeap::stop()
Relates :	JDK-8040803 - G1: Concurrent mark hangs when mark stack overflows
Relates :	JDK-8040247 - vm/gc/containers/LinkedBlockingDeque_Arrays fails when shutting down
Relates :	JDK-8044795 - G1: Enable G1CollectedHeap::stop()
Relates :	JDK-8178542 - G1: VM hangs during shutdown due to mark stack overflow
Relates :	JDK-8040804 - G1: Concurrent mark stuck in loop calling os::elapsedVTime()