While doing a code review for a G1 change I think I spotted a subtle race in the code. Consider two mutator threads (A and B) doing the following concurrently:
A: it attempts an allocation, the allocation fails, it schedules a GC VM op in an attempt to free up space
B: it needs to explicitly start a concurrent marking cycle (say: System.gc() with -XX:+ExplicitGCInvokesConcurrent), it calls collect(Cause cause) which schedules a GC VM op with the should_initiate_conc_mark flag set to true.
Currently, one of the GC VM ops will "win" and do the GC, the other will observe that a GC took place between the time it was scheduled and the time it was executed and do nothing else. if A's VM op "wins", then B's VM op will not do the GC and as a result the conc marking cycle will not start.
The mechanisms that use collect(Cause cause) to explicitly start a concurrent marking cycle and should be affected by this issue are:
-XX:+ExplicitGCInvokesConcurrent
-XX:+GCLockerInvokesConcurrent
and the recent changes for
6976060: G1: humongous object allocations should initiate marking cycles when necessary
I should point out that, as far as I know, we haven't come across this issue during testing so we should first reproduce it with a test to prove that the race can indeed happen.