JDK-7118202 : G1: eden size unnecessarily drops to the minimum
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: hs23
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2011-12-05
  • Updated: 2013-09-18
  • Resolved: 2012-01-20
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7 JDK 8 Other
7u4Fixed 8Fixed hs23Fixed
Related Reports
Relates :  
We see every now and then G1 decreasing the eden size to the minimum for a while without apparent reason and keeping it there for a few GCs before things get back to normal.

It looks as if the issue is an integer overflow (underflow?) during this calculation:

    size_t rs_length_diff = _max_rs_lengths - _recorded_rs_lengths;

It looks as if _max_rs_lengths is smaller than _recorded_rs_lengths, rs_length_diff (being an unsigned value) gets _very_ large, and the prediction way overpredict.

Many thanks for Thomas Schatzl who, once again, tracked this down.

EVALUATION http://hg.openjdk.java.net/lambda/lambda/hotspot/rev/d23d2b18183e

EVALUATION http://hg.openjdk.java.net/hsx/hotspot-emb/hotspot/rev/d23d2b18183e

EVALUATION http://hg.openjdk.java.net/hsx/hotspot-gc/hotspot/rev/d23d2b18183e

SUGGESTED FIX We'll go with the above defensive fix but, now that we know what the race is, we're going to also fix it on a separate CR (7119027).

EVALUATION (with more input from Thomas) Further info on the race. It looks as if it happens between the following two threads: a) a concurrent refinement thread that's sampling the young RSet lengths and it's updating the inc CSet info with update_incremental_cset_info() (which will decrease / increase _inc_cset_recorded_rs_lengths) b) a mutator thread that's retiring a mutator alloc region and it's adding it to the inc CSet with add_region_to_incremental_cset_lhs() -> add_region_to_incremental_cset_common() which will increase _inc_cset_recorded_rs_lengths. The updates to _inc_cset_recorded_rs_lengths are not done atomically or in a mutually exclusive way. Thread b) is holding the Heap_lock at that point but thread a) does not take the Heap_lock while doing this operation. It should also be noted. That several other fields that are updated by add_region_to_incremental_cset_common() and update_incremental_cset_info() could also be corrupted because of this race. We discovered (OK, Thomas did!) the corruption on _inc_cset_recorded_rs_lengths because of the side-effects of the underflow. Additional note: Attempting to fix the race by ensuring that thread a) takes the Heap_lock before it calls update_incremental_cset_info() will likely result in a deadlock. Thread a) joins the STS while it's sampling the young RSet lengths (so it has to explicitly yield or leave the STS before a GC can happen). Consider the following scenario: Thread a) joins the STS, does some work, and tries to take the Heap_lock. Mutator thread c) is trying to do a GC, takes the Heap_lock (it's done by the VM op) and then waits for all threads in the STS to yield / leave. Deadlock. If thread a) took the Heap_lock before it joined the STS, it'd probably work. But, it'd keep the Heap_lock for long periods of time which will induce latencies on any mutator thread that needs the Heap_lock in order to retire the active region / allocate a new region.

SUGGESTED FIX Given that we're planning to revamp the prediction code, the most prudent course of action is to be defensive and catch the case where rs_length_diff underflows (and set it to 0 when this happens).