The fix for 7013718: G1: small fixes for two assert/guarantee failures seems to have introduced a deadlock in the code. The race is the following:
- GC threads want to allocate a new heap region, the free list is empty, and the "free regions coming" flag is set so they are waiting in case new regions are made available on the secondary free list.
- The concurrent cleanup thread has finished processing the cleanup list and is trying to join the STS to call the record_concurrent_mark_cleanup_completed() method:
_cm->completeCleanup();
_sts.join();
g1_policy->record_concurrent_mark_cleanup_completed();
_sts.leave();
...
// We're done: no more free regions coming.
g1h->reset_free_regions_coming();
We should actually call the reset_free_regions_coming() method, that notifies the GC workers that no more regions are coming, before trying to join the STS. That way the workers can complete their work so that the GC pause can also complete and allow the cleanup thread to join the STS.