Bug ID: JDK-7118202 G1: eden size unnecessarily drops to the minimum

Type: Bug
Component: hotspot
Sub-Component: gc
Affected Version: hs23

Priority: P3
Status: Closed
Resolution: Fixed
OS: generic
CPU: generic

Submitted: 2011-12-05
Updated: 2013-09-18
Resolved: 2012-01-20

JDK 7	JDK 8	Other
7u4Fixed	8Fixed	hs23Fixed

We see every now and then G1 decreasing the eden size to the minimum for a while without apparent reason and keeping it there for a few GCs before things get back to normal.

It looks as if the issue is an integer overflow (underflow?) during this calculation:

    size_t rs_length_diff = _max_rs_lengths - _recorded_rs_lengths;

It looks as if _max_rs_lengths is smaller than _recorded_rs_lengths, rs_length_diff (being an unsigned value) gets _very_ large, and the prediction way overpredict.

Many thanks for Thomas Schatzl who, once again, tracked this down.

EVALUATION http://hg.openjdk.java.net/lambda/lambda/hotspot/rev/d23d2b18183e
22-03-2012
EVALUATION http://hg.openjdk.java.net/hsx/hotspot-emb/hotspot/rev/d23d2b18183e
15-12-2011
EVALUATION http://hg.openjdk.java.net/hsx/hotspot-gc/hotspot/rev/d23d2b18183e
08-12-2011
SUGGESTED FIX We'll go with the above defensive fix but, now that we know what the race is, we're going to also fix it on a separate CR (7119027).
07-12-2011
EVALUATION (with more input from Thomas) Further info on the race. It looks as if it happens between the following two threads: a) a concurrent refinement thread that's sampling the young RSet lengths and it's updating the inc CSet info with update_incremental_cset_info() (which will decrease / increase _inc_cset_recorded_rs_lengths) b) a mutator thread that's retiring a mutator alloc region and it's adding it to the inc CSet with add_region_to_incremental_cset_lhs() -> add_region_to_incremental_cset_common() which will increase _inc_cset_recorded_rs_lengths. The updates to _inc_cset_recorded_rs_lengths are not done atomically or in a mutually exclusive way. Thread b) is holding the Heap_lock at that point but thread a) does not take the Heap_lock while doing this operation. It should also be noted. That several other fields that are updated by add_region_to_incremental_cset_common() and update_incremental_cset_info() could also be corrupted because of this race. We discovered (OK, Thomas did!) the corruption on _inc_cset_recorded_rs_lengths because of the side-effects of the underflow. Additional note: Attempting to fix the race by ensuring that thread a) takes the Heap_lock before it calls update_incremental_cset_info() will likely result in a deadlock. Thread a) joins the STS while it's sampling the young RSet lengths (so it has to explicitly yield or leave the STS before a GC can happen). Consider the following scenario: Thread a) joins the STS, does some work, and tries to take the Heap_lock. Mutator thread c) is trying to do a GC, takes the Heap_lock (it's done by the VM op) and then waits for all threads in the STS to yield / leave. Deadlock. If thread a) took the Heap_lock before it joined the STS, it'd probably work. But, it'd keep the Heap_lock for long periods of time which will induce latencies on any mutator thread that needs the Heap_lock in order to retire the active region / allocate a new region.
06-12-2011
SUGGESTED FIX Given that we're planning to revamp the prediction code, the most prudent course of action is to be defensive and catch the case where rs_length_diff underflows (and set it to 0 when this happens).
05-12-2011