Bug ID: JDK-6974966 G1: unnecessary direct-to-old allocations

Type: Bug
Component: hotspot
Sub-Component: gc
Affected Version: hs13,hs17,hs19

Priority: P3
Status: Closed
Resolution: Fixed
OS: generic
CPU: generic

Submitted: 2010-08-05
Updated: 2013-09-18
Resolved: 2011-04-23

JDK 6	JDK 7	Other
6u25Fixed	7Fixed	hs20Fixed

While testing a different set of G1 changes I noticed that we seemed to attempt to allocate regions directly into the old gen more often than I was expecting (I was expecting that to happen only when the GC locker is active).

EVALUATION http://hg.openjdk.java.net/jdk7/build/hotspot/rev/631f79e71e90
25-12-2010
EVALUATION http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/631f79e71e90
09-12-2010
EVALUATION http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/631f79e71e90
30-11-2010
SUGGESTED FIX The fix ended up being more involved than I as expecting. It includes the following changes and improvements: - The assumption so far has been that direct-to-old allocations would only happen when the GC locker is active. However, having a current alloc region that can be either young or not, depending on whether the GC locker was active when the region was allocated, made the slow-path allocation code very complicated and it added a fair number of "is young" checks. So, we took the decision to completely outlaw direct-to-old allocations in G1. The changes that are done in the context of this CR will force the mutators to stall while the GC locker is active. This will be improved in the near future and, instead, we'll extend the Eden while the GC locker is active, which is very straightforward to do in G1 (see 6994056). So, from now on, it can be assumed that all regions that are allocated by mem_allocate() and allocate_new_tlab() will be young (the only exception to that being humongous regions which have their own special allocation path anyway). - Before, when a thread allocated a new young region it also dirtied all its cards so that the write barrier never took the slow path for updates on that young region. I changed this so that when each thread allocates a block [start, end) on a young region it dirties only the cards that span that range. This way, the "pain" is spread out a bit more evenly (instead of one thread doing everything) and it made the code more straightforward (I don't have to pass a region as a parameter through several method calls and a VM operation). Additionally, given that each thread dirties each block that it allocates, it would never actually allocate any objects in a part of a young region whose cards have not been dirtied already (this was not guaranteed before; another thread could allocate a new block in an area of the region where the dirtying thread had not reached yet). - I removed the attempt to allocate a humongous object from the attempt_allocation_slow() method and now this is done separately in a new attempt_allocation_humongous() method. It is called from mem_allocate() if the allocation request size is humongous (TLABs are never humongous, so allocate_new_tlab() never has to call attempt_allocation_humongous()). - If attempt_allocation_slow() initiates a collection which successfully completes, the VM operation that did the collection will also try to satisfy the allocation request that attempt_allocation_slow() was trying to satisfy so that the thread that initiates the collection gets "first pick" of the space reclaimed by it. This was one of the main reasons behind all these changes (see suggestion (b) in Sugggested Fix Note #1). - If attempt_allocation_humongous() does not manage to allocate a humongous object it tries to do a collection before trying again (at the end of the VM operation that did the collection) in case the collection reclaims enough contiguous space for the allocation request to be successful. - The G1 policy is more careful to set the young list target length to be the survivor number +1 (see suggestion (a) in Suggested Fix Note #1). - The changes include a fair amount of code tidying up and restructuring and redundant code removal. Even though the resulting code is somewhat lengthier than what was there before (a lot of the additions are comments and extra asserts though), it is much more readable and easier to follow. Before, most of the work was done in attempt_allocation_slow() and having to ensure all the different special cases made the loop in that method much more complex than it should be.
03-09-2010
EVALUATION There are a couple of reasons that cause this issue. Let's look at the attempt_allocation_slow() method (which I've trimmed for readability): HeapWord* G1CollectedHeap::attempt_allocation_slow(size_t word_size, bool permit_collection_pause) { HeapWord* res = NULL; HeapRegion* allocated_young_region = NULL; ... if (isHumongous(word_size)) { ... } else { ... if (permit_collection_pause) { A> do_collection_pause_if_appropriate(word_size); } // Make sure we have an allocation region available. if (_cur_alloc_region == NULL) { B> bool next_is_young = should_set_young_locked(); // If the next region is not young, make sure it's zero-filled. _cur_alloc_region = newAllocRegion(word_size, !next_is_young); ... } So, it's possible that a call to the attempt_allocation_slow() method will initiate a GC in line tagged with A>, but find that should_set_young_locked() returns false in line tagged with B>. If that happens, the region will not be tagged as young and, hence, be allocated directly into the old generation. There are two ways for the above to happen: (a) The G1 policy gets it wrong and the target young list length is the same as the number of survivor regions already in the young list. So, even though we just did a collection, we cannot allocate any more regions given that the young list is already full. (b) The thread, let's call it T0, that executes the attempt_allocation_slow() method has to release the heap lock before it executes the VM operation to do the collection. So, even if the young list is not full after the collection (i.e., the problem in described in (a) doesn't happen), it's possible for other threads to perform allocations that will fill up the young list before T0 gets a chance to grab the heap lock again. This will again result in the young list looking full at point A>.
05-08-2010
SUGGESTED FIX Possible fixes for the two items described in Note #1 in the Evaluation section are (a) Fix the G1 policy so that the target young list length is at least the survivor region number plus one (i.e., always allow the allocation of at least one eden region). (b) This is a bit trickier to fix. What should work is to actually perform the allocation at the very end of the VM operation that does the GC so that, if there's any space left in the heap, the thread that caused the GC gets first pick of it.
05-08-2010

Duplicate :	JDK-6920109 - G1: lingering instabilities with ReduceInitialCardMarks
Relates :	JDK-7005799 - G1: nsk/regression/b6186200 fails with OOME
Relates :	JDK-6910247 - G1: Refactor code in g1BlockOffsetTable.[ch]pp with that of other collectors
Relates :	JDK-7003860 - G1: assert(_cur_alloc_region == NULL \|\| !expect_null_cur_alloc_region) fails
Relates :	JDK-7023069 - G1: Introduce symmetric locking in the slow allocation path
Relates :	JDK-6994056 - G1: when GC locker is active, extend the Eden instead of allocating into the old gen
Relates :	JDK-6994322 - Remove the is_tlab and is_noref / is_large_noref parameters from the CollectedHeap interface
Relates :	JDK-7005808 - G1: re-enable ReduceInitialCardMarks for G1
Relates :	JDK-6977804 - G1: remove the zero-filling thread