JDK-7181612 : G1: Premature Evacuation (7143858) strikes again
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 7u4
  • Priority: P4
  • Status: Closed
  • Resolution: Won't Fix
  • OS: generic
  • CPU: generic
  • Submitted: 2012-07-03
  • Updated: 2013-09-18
  • Resolved: 2013-03-21
Related Reports
Relates :  
Description
In the GC logs for the ATG CRM demo we were seeing a fair number of "premature" evacuations (i.e. evacuation pauses that seemed to occur back-to-back with another evacuation pause with little or no application in-between and an eden occupancy that was much much less than capacity).

In a typical GC log around 30 such pauses were being seen.

Printing the GC cause indicated that all of these premature evacuations were being instigated by the GC locker code.

This issue was reported as 7143858. And turned out to be a race between an allocating thread requesting a GC and the GC Locker requesting GC. 

Even after the 7143858 was fixed, however, we are still seeing GC logs from ATG that include some premature evacuations (3 vs. 30) and they are still GCs being instigated by the GC Locker.

Comments
After analysing PSR's log files, it's not worth fixing this. 7143858 solved the problem for the allocation path. The remaining instances come from calling G1CollectedHeap::collect() to initiate marking prior to a humongous allocation. We can revisit this if the frequency of this type of GC locker initiated GCs with very small edens increases.
21-03-2013

Since, even after adding a 0.5 second sleep between the GC locker becoming inactive and scheduling the GC locker initiated GC, only resulted in two instances of this problem: cairnapple{jcuthber}:322> grep "BANG" results_1-gc_0.log 1258.507: BANG!!!! GC_locker::needs_gc() is true but this is not a GC locker pause - look for GCLocker GC after this 1262.275: BANG!!!! GC_locker::needs_gc() is true but this is not a GC locker pause - look for GCLocker GC after this cairnapple{jcuthber}:323> Fixing this would eliminate only 2 GCs out of: cairnapple{jcuthber}:323> cat -n results_1-gc_0.log | grep "Eden" | wc -l 5288 cairnapple{jcuthber}:324>
21-03-2013

The other type of premature evacuation is the one similar to those in 7143858. Namely an application thread succeeds in scheduling a GC between another thread, leaving a JNI critical region, clearing the GCLocker::is_active() flag and scheduling the GCLocker induced GC, and then clearing the _needs_gc flag: 1258.689: [GC pause (GCLocker Initiated GC) (young)1258.761: [SoftReference, 0 refs, 0.0000070 secs]1258.761: [WeakReference, 790 refs, 0.0002190 secs]1258.762: [FinalReference, 7 refs, 0.0000060 secs]1258.762: [PhantomReference, 0 refs, 0.0000020 secs]1258.762: [JNI Weak Reference, 0.0000030 secs], 0.0755730 secs] [Parallel Time: 71.4 ms, GC Workers: 18] [GC Worker Start (ms): Min: 1258689.0, Avg: 1258689.4, Max: 1258689.7, Diff: 0.8] [Ext Root Scanning (ms): Min: 0.0, Avg: 6.5, Max: 69.9, Diff: 69.8, Sum: 117.4] [SATB Filtering (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1] [Update RS (ms): Min: 0.0, Avg: 2.8, Max: 3.9, Diff: 3.9, Sum: 49.7] [Processed Buffers: Min: 0, Avg: 3.8, Max: 10, Diff: 10, Sum: 69] [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [Object Copy (ms): Min: 0.1, Avg: 5.7, Max: 6.7, Diff: 6.6, Sum: 103.3] [Termination (ms): Min: 0.0, Avg: 54.5, Max: 59.3, Diff: 59.3, Sum: 980.3] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2, Sum: 0.6] [GC Worker Total (ms): Min: 69.2, Avg: 69.5, Max: 70.0, Diff: 0.8, Sum: 1251.5] [GC Worker End (ms): Min: 1258758.9, Avg: 1258758.9, Max: 1258759.0, Diff: 0.1] [Code Root Fixup: 0.1 ms] [Clear CT: 2.2 ms] [Other: 1.9 ms] [Choose CSet: 0.0 ms] [Ref Proc: 1.4 ms] [Ref Enq: 0.0 ms] [Free CSet: 0.1 ms] [Eden: 1024.0K(373.0M)->0.0B(380.0M) Survivors: 27.0M->20.0M Heap: 1412.7M(2048.0M)->1405.1M(2048.0M)] [Times: user=1.21 sys=0.01, real=0.10 secs] The events leading up to the GCLocker initiated GC are: A thread successfully requests a GC before the GC locker becomes active. Before the GC can be performed, however, the GC locker becomes active. The GC, before it starts, sees this: 1257.503: Setting _needs_gc. Thread "VM Thread" 1 locked. 1258.005: GC_locker active: not performing GC [Eden: 186.0M(362.0M)] and skips the GC after setting the _needs_gc flag. 1258.005: Thread "[ACTIVE] ExecuteThread: '17' for queue: 'weblogic.kernel.Default (self-tuning)'" is performing GC after exiting critical section, 0 locked The thread that was in the JNI critical section leaves, and the GC locker is now inactive. Another requested GC successfully starts (an initial mark for a humongous allocation as opposed to a GC from a failed attempt_allocation_humoungous call): 1258.507: BANG!!!! GC_locker::needs_gc() is true but this is not a GC locker pause - look for GCLocker GC after this 1258.525: [GC pause (G1 Humongous Allocation) (young) (initial-mark)1258.613: [SoftReference, 0 refs, 0.0000070 secs]1258.613: [WeakReference, 831 refs, 0.0002330 secs]1258.613: [FinalReference, 5688 refs, 0.0257140 secs]1258.639: [PhantomReference, 0 refs, 0.0000040 secs]1258.639: [JNI Weak Reference, 0.0000010 secs], 0.1165590 secs] [Parallel Time: 85.7 ms, GC Workers: 18] [GC Worker Start (ms): Min: 1258525.4, Avg: 1258525.9, Max: 1258526.2, Diff: 0.8] [Ext Root Scanning (ms): Min: 0.0, Avg: 7.4, Max: 72.4, Diff: 72.3, Sum: 133.3] [Update RS (ms): Min: 0.0, Avg: 2.8, Max: 4.3, Diff: 4.3, Sum: 51.2] [Processed Buffers: Min: 0, Avg: 4.9, Max: 11, Diff: 11, Sum: 88] [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 1.1] [Object Copy (ms): Min: 0.1, Avg: 8.4, Max: 59.2, Diff: 59.2, Sum: 150.8] [Termination (ms): Min: 0.0, Avg: 65.6, Max: 74.6, Diff: 74.6, Sum: 1180.8] [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.7, Diff: 0.7, Sum: 4.0] [GC Worker Total (ms): Min: 83.9, Avg: 84.5, Max: 85.2, Diff: 1.3, Sum: 1521.2] [GC Worker End (ms): Min: 1258610.2, Avg: 1258610.4, Max: 1258610.8, Diff: 0.7] [Code Root Fixup: 0.2 ms] [Clear CT: 1.9 ms] [Other: 28.8 ms] [Choose CSet: 0.0 ms] [Ref Proc: 27.6 ms] [Ref Enq: 0.2 ms] [Free CSet: 0.5 ms] [Eden: 201.0M(362.0M)->0.0B(373.0M) Survivors: 38.0M->27.0M Heap: 1622.8M(2048.0M)->1411.9M(2048.0M)] [Times: user=1.56 sys=0.01, real=0.14 secs] It sees that the GC locker is no longer active - but the _needs_gc is still set - and starts performs the GC. A solution is to employ a similar fix to that added for 7143858 to attempt_allocation_humongous: We could either make the scheduled initial mark pause fail (or delay the scheduling) and stall until the GC locker induced GC has been performed; then we should recheck whether the humongous allocation still needs to trigger the initial mark and retry to do so if necessary.
21-03-2013

Looking at the logs from PSR, there seems to be multiple reasons for an evacuation to look premature. The first (and most common) is: 1231.542: [GC pause (G1 Humongous Allocation) (young) (initial-mark)1231.636: [SoftReference, 0 refs, 0.0000070 secs]1231.636: [WeakReference, 974 refs, 0.0002520 secs]1231.636: [FinalReference, 5 refs, 0.0000050 secs]1231.636: [PhantomReference, 0 refs, 0.0000020 secs]1231.636: [JNI Weak Reference, 0.0000020 secs], 0.0968990 secs] [Parallel Time: 91.9 ms, GC Workers: 18] [GC Worker Start (ms): Min: 1231542.1, Avg: 1231542.8, Max: 1231543.8, Diff: 1.6] [Ext Root Scanning (ms): Min: 0.0, Avg: 7.9, Max: 78.8, Diff: 78.8, Sum: 142.1] [Update RS (ms): Min: 0.0, Avg: 2.8, Max: 4.1, Diff: 4.1, Sum: 50.7] [Processed Buffers: Min: 0, Avg: 3.8, Max: 9, Diff: 9, Sum: 68] [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [Object Copy (ms): Min: 0.0, Avg: 12.5, Max: 61.8, Diff: 61.8, Sum: 224.7] [Termination (ms): Min: 0.0, Avg: 66.5, Max: 75.4, Diff: 75.4, Sum: 1196.2] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2] [GC Worker Total (ms): Min: 88.7, Avg: 89.7, Max: 90.4, Diff: 1.6, Sum: 1614.1] [GC Worker End (ms): Min: 1231632.5, Avg: 1231632.5, Max: 1231632.5, Diff: 0.0] [Code Root Fixup: 0.0 ms] [Clear CT: 1.8 ms] [Other: 3.1 ms] [Choose CSet: 0.0 ms] [Ref Proc: 2.3 ms] [Ref Enq: 0.0 ms] [Free CSet: 0.2 ms] [Eden: 5120.0K(367.0M)->0.0B(375.0M) Survivors: 33.0M->25.0M Heap: 1395.5M(2048.0M)->1391.6M(2048.0M)] [Times: user=1.64 sys=0.01, real=0.11 secs] i.e. we are attempting to allocate a humongous object and the allocation will send the heap occupancy over the IHOP value. Therefore we immediately trigger an initial mark GC. The heap transition line of such a pause should be read as: [Eden: 5120.0K(367.0M)->0.0B(375.0M) Survivors: 33.0M->25.0M Heap: 1395.5M(2048.0M)->1391.6M(2048.0M)] The target eden size calculated at the end of the previous GC was 367mb. Of that, currently 5Mb was allocated before the humongous allocation attempt. The next most common premature looking evacuation is: 2225.527: [GC pause (G1 Evacuation Pause) (young)2225.865: [SoftReference, 0 refs, 0.0000060 secs]2225.865: [WeakReference, 0 refs, 0.0000010 secs]2225.865: [FinalReference, 0 refs, 0.0000010 secs]2225.865: [PhantomReference, 0 refs, 0.0000010 secs]2225.865: [JNI Weak Reference, 0.0000020 secs] (to-space exhausted), 0.3882200 secs] [Parallel Time: 336.7 ms, GC Workers: 18] [GC Worker Start (ms): Min: 2225527.6, Avg: 2225527.9, Max: 2225528.3, Diff: 0.7] [Ext Root Scanning (ms): Min: 0.0, Avg: 6.7, Max: 59.6, Diff: 59.6, Sum: 121.3] [Update RS (ms): Min: 0.0, Avg: 5.3, Max: 8.4, Diff: 8.4, Sum: 95.3] [Processed Buffers: Min: 0, Avg: 7.7, Max: 17, Diff: 17, Sum: 138] [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2, Sum: 0.8] [Object Copy (ms): Min: 248.0, Avg: 297.0, Max: 305.6, Diff: 57.6, Sum: 5346.1] [Termination (ms): Min: 0.0, Avg: 25.9, Max: 28.6, Diff: 28.6, Sum: 466.1] [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 1.2] [GC Worker Total (ms): Min: 334.6, Avg: 335.0, Max: 336.0, Diff: 1.4, Sum: 6030.8] [GC Worker End (ms): Min: 2225862.9, Avg: 2225863.0, Max: 2225864.1, Diff: 1.2] [Code Root Fixup: 0.1 ms] [Clear CT: 1.0 ms] [Other: 50.4 ms] [Choose CSet: 0.0 ms] [Ref Proc: 0.9 ms] [Ref Enq: 0.0 ms] [Free CSet: 0.2 ms] [Eden: 195.0M(372.0M)->0.0B(400.0M) Survivors: 28.0M->0.0B Heap: 2006.4M(2048.0M)->1999.4M(2048.0M)] [Times: user=1.48 sys=0.02, real=0.41 secs] ************************************************ 2225.986: [GC pause (G1 Evacuation Pause) (mixed)2226.147: [SoftReference, 0 refs, 0.0000060 secs]2226.147: [WeakReference, 0 refs, 0.0000020 secs]2226.147: [FinalReference, 0 refs, 0.0000010 secs]2226.147: [PhantomReference, 0 refs, 0.0000020 secs]2226.147: [JNI Weak Reference, 0.0000030 secs] (to-space exhausted), 0.1817370 secs] [Parallel Time: 159.5 ms, GC Workers: 18] [GC Worker Start (ms): Min: 2225986.7, Avg: 2225987.1, Max: 2225987.4, Diff: 0.7] [Ext Root Scanning (ms): Min: 0.0, Avg: 7.2, Max: 72.8, Diff: 72.8, Sum: 130.3] [Update RS (ms): Min: 0.0, Avg: 3.2, Max: 4.6, Diff: 4.6, Sum: 58.3] [Processed Buffers: Min: 0, Avg: 30.8, Max: 49, Diff: 49, Sum: 555] [Scan RS (ms): Min: 0.0, Avg: 2.3, Max: 2.9, Diff: 2.9, Sum: 42.2] [Object Copy (ms): Min: 43.8, Avg: 105.9, Max: 129.4, Diff: 85.6, Sum: 1906.3] [Termination (ms): Min: 0.0, Avg: 39.5, Max: 42.0, Diff: 42.0, Sum: 710.5] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2] [GC Worker Total (ms): Min: 157.9, Avg: 158.2, Max: 158.6, Diff: 0.7, Sum: 2848.0] [GC Worker End (ms): Min: 2226145.3, Avg: 2226145.3, Max: 2226145.3, Diff: 0.0] [Code Root Fixup: 0.1 ms] [Clear CT: 1.2 ms] [Other: 20.9 ms] [Choose CSet: 0.4 ms] [Ref Proc: 1.1 ms] [Ref Enq: 0.0 ms] [Free CSet: 0.5 ms] [Eden: 7168.0K(400.0M)->0.0B(400.0M) Survivors: 0.0B->0.0B Heap: 2006.4M(2048.0M)->2006.4M(2048.0M)] [Times: user=1.20 sys=0.02, real=0.21 secs] ************************************************ 2226.224: [GC pause (G1 Evacuation Pause) (mixed)2226.440: [SoftReference, 0 refs, 0.0000060 secs]2226.441: [WeakReference, 0 refs, 0.0000020 secs]2226.441: [FinalReference, 0 refs, 0.0000010 secs]2226.441: [PhantomReference, 0 refs, 0.0000010 secs]2226.441: [JNI Weak Reference, 0.0000030 secs] (to-space exhausted), 0.2390040 secs] [Parallel Time: 215.3 ms, GC Workers: 18] [GC Worker Start (ms): Min: 2226224.4, Avg: 2226224.8, Max: 2226225.6, Diff: 1.2] [Ext Root Scanning (ms): Min: 0.0, Avg: 9.0, Max: 81.6, Diff: 81.6, Sum: 161.9] [Update RS (ms): Min: 0.0, Avg: 3.6, Max: 4.9, Diff: 4.9, Sum: 64.8] [Processed Buffers: Min: 0, Avg: 12.5, Max: 24, Diff: 24, Sum: 225] [Scan RS (ms): Min: 0.0, Avg: 2.3, Max: 3.0, Diff: 2.9, Sum: 42.1] [Object Copy (ms): Min: 65.3, Avg: 135.9, Max: 169.1, Diff: 103.8, Sum: 2445.7] [Termination (ms): Min: 0.0, Avg: 63.4, Max: 67.4, Diff: 67.4, Sum: 1141.8] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.3] [GC Worker Total (ms): Min: 213.4, Avg: 214.3, Max: 214.6, Diff: 1.2, Sum: 3856.5] [GC Worker End (ms): Min: 2226439.0, Avg: 2226439.0, Max: 2226439.0, Diff: 0.0] [Code Root Fixup: 0.1 ms] [Clear CT: 1.3 ms] [Other: 22.3 ms] [Choose CSet: 0.5 ms] [Ref Proc: 0.8 ms] [Ref Enq: 0.0 ms] [Free CSet: 0.7 ms] [Eden: 0.0B(400.0M)->0.0B(400.0M) Survivors: 0.0B->0.0B Heap: 2006.4M(2048.0M)->2006.4M(2048.0M)] [Times: user=1.70 sys=0.04, real=0.26 secs] i.e. we get an evacuation failure. Basically at the end one GC we have no regions available to use for eden. Actually in the first GC we have 197 regions and we allocate them before triggering the GC. In the second GC we allocate 7 regions for eden and in the last none.
20-03-2013