United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-7118202 G1: eden size unnecessarily drops to the minimum
JDK-7118202 : G1: eden size unnecessarily drops to the minimum

Details
Type:
Bug
Submit Date:
2011-12-05
Status:
Closed
Updated Date:
2012-03-22
Project Name:
JDK
Resolved Date:
2012-01-20
Component:
hotspot
OS:
generic
Sub-Component:
gc
CPU:
generic
Priority:
P3
Resolution:
Fixed
Affected Versions:
hs23
Fixed Versions:
hs23 (b08)

Related Reports
Backport:
Backport:
Relates:

Sub Tasks

Description
We see every now and then G1 decreasing the eden size to the minimum for a while without apparent reason and keeping it there for a few GCs before things get back to normal.

It looks as if the issue is an integer overflow (underflow?) during this calculation:

    size_t rs_length_diff = _max_rs_lengths - _recorded_rs_lengths;

It looks as if _max_rs_lengths is smaller than _recorded_rs_lengths, rs_length_diff (being an unsigned value) gets _very_ large, and the prediction way overpredict.

Many thanks for Thomas Schatzl who, once again, tracked this down.

                                    

Comments
SUGGESTED FIX

Given that we're planning to revamp the prediction code, the most prudent course of action is to be defensive and catch the case where rs_length_diff underflows (and set it to 0 when this happens).
                                     
2011-12-05
EVALUATION

(with more input from Thomas) Further info on the race. It looks as if it happens between the following two threads:

a) a concurrent refinement thread that's sampling the young RSet lengths and it's updating the inc CSet info with update_incremental_cset_info() (which will decrease / increase _inc_cset_recorded_rs_lengths)

b) a mutator thread that's retiring a mutator alloc region and it's adding it to the inc CSet with add_region_to_incremental_cset_lhs() -> add_region_to_incremental_cset_common() which will increase _inc_cset_recorded_rs_lengths.

The updates to _inc_cset_recorded_rs_lengths are not done atomically or in a mutually exclusive way. Thread b) is holding the Heap_lock at that point but thread a) does not take the Heap_lock while doing this operation.

It should also be noted. That several other fields that are updated by add_region_to_incremental_cset_common() and update_incremental_cset_info() could also be corrupted because of this race. We discovered (OK, Thomas did!) the corruption on _inc_cset_recorded_rs_lengths because of the side-effects of the underflow.

Additional note:

Attempting to fix the race by ensuring that thread a) takes the Heap_lock before it calls update_incremental_cset_info() will likely result in a deadlock. Thread a) joins the STS while it's sampling the young RSet lengths (so it has to explicitly yield or leave the STS before a GC can happen). Consider the following scenario:

Thread a) joins the STS, does some work, and tries to take the Heap_lock.
Mutator thread c) is trying to do a GC, takes the Heap_lock (it's done by the VM op) and then waits for all threads in the STS to yield / leave.
Deadlock.

If thread a) took the Heap_lock before it joined the STS, it'd probably work. But, it'd keep the Heap_lock for long periods of time which will induce latencies on any mutator thread that needs the Heap_lock in order to retire the active region / allocate a new region.
                                     
2011-12-06
SUGGESTED FIX

We'll go with the above defensive fix but, now that we know what the race is, we're going to also fix it on a separate CR (7119027).
                                     
2011-12-07
EVALUATION

http://hg.openjdk.java.net/hsx/hotspot-gc/hotspot/rev/d23d2b18183e
                                     
2011-12-08
EVALUATION

http://hg.openjdk.java.net/hsx/hotspot-emb/hotspot/rev/d23d2b18183e
                                     
2011-12-15
EVALUATION

http://hg.openjdk.java.net/lambda/lambda/hotspot/rev/d23d2b18183e
                                     
2012-03-22



Hardware and Software, Engineered to Work Together