JDK-8178542 : G1: VM hangs during shutdown due to mark stack overflow
Type:Bug
Component:hotspot
Sub-Component:gc
Affected Version:9,10
Priority:P3
Status:Resolved
Resolution:Fixed
Submitted:2017-04-12
Updated:2018-06-21
Resolved:2017-05-01
The Version table provides details related to the release that this issue/RFE will be addressed.
Unresolved : Release in which this issue/RFE will be addressed. Resolved: Release in which this issue/RFE has been resolved. Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.
Some GC threads are waiting in the first sync barrier and others marking threads cannot get there because flag _has_overflown was cleaned
Comments
Problem related to race condition at line g1ConcurrentMark.cpp:2961 near the end of G1CMTask::do_marking_step() when it was called from G1CMConcurrentMarkingTask::work() :
2961 if (_cm->has_overflown()) {
...
Let assume that some thread passed this check with _cm->has_overflown() = false
and overflow happened in some other thread before first thread returned from G1CMTask::do_marking_step() to the G1CMConcurrentMarkingTask::work()
Some other thread may enter now first_sync_barrier
First thread then returns to the G1CMConcurrentMarkingTask::work()
927 the_task->do_marking_step(mark_step_duration_ms,
928 true /* do_termination */,
929 false /* is_serial*/);
930
931 double end_vtime_sec = os::elapsedVTime();
932 double elapsed_vtime_sec = end_vtime_sec - start_vtime_sec;
933 _cm->clear_has_overflown();
and at line 933 clears _has_overflown flag
As result this thread cannot reach first_sync_barrier.
Marking threads will loop till Full GC abort first_sync_barrier.
And if VM will exit without Full GC we will get hang in shutdown.
This problem might be fixed by removing wrong line 933 - no need to clear _has_overflown flag.
This issue was reproduced with b8065402.java (attached to the JDK-8065402)