JDK-8027959 : Early reclamation of large objects in G1
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: hs25,8
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2013-11-07
  • Updated: 2016-04-19
  • Resolved: 2014-07-23
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8 JDK 9
8u40Fixed 9 b28Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
In G1 large objects are always allocated in the old generation, requiring a complete heap liveness analysis (full gc, marking) to reclaim them.

This is far from ideal for many transaction based enterprise applications that create large objects that are only live until a (typically short-lived) transaction is not completed (e.g. in a ResultSet of a JDBC query that generates a large result).
This results in the heap filling up relatively quickly, typically leading to unnecessary marking cycles just to reclaim them.

Investigate options to reclaim these objects more quickly; options found so far are:
a) logically keep LOBs in young gen, doing in-place aging
b) keep LOBs in the old gen, but use the remembered set and information from young collection to determine LOB liveness


Comments
The current implementation uses separate values for humongous regions and regions in the collection set, so the comments about issues in the remembered set are obsolete.
02-02-2015

In reply to the previous comment: >Putting the humongous objects (even temporarily) into the collection set gives issues with remembered set updates not occurring any more. > >Current prototype changes in_cset_fast_test to include humongous objects ( There are no issues with remembered set updates when temporarily putting the humongous object into the _in_cset_fast_test collection set test because during rset update we use the _in_collection_set member of the heapregion, not the in_cset_fast_test.
15-07-2014

Actually it is not required to put the humongous objects into the collection set - it is sufficient to make references to them pass the G1CollectedHeap::in_cset_fast test in G1ParCopyClosure::do_oop_work. Putting the humongous objects (even temporarily) into the collection set gives issues with remembered set updates not occurring any more. Current prototype changes in_cset_fast_test to include humongous objects (actually not all calls to G1CollectedHeap::in_cset_fast_test() should be changed, but that's a minor detail) - first tests indicate that this is indistinguishable in performance from the original code (more testing needed), while the first prototype increased the "Object Copy" time by ~10%.
10-03-2014

That should actually work :-) It needs some considerations due to must-not-be-a-H-region asserts all over the place, but should be much less performance intrusive than checking every reference.
14-02-2014

Even though I said this approach is more fragile, there's actually precedent: The H object allocation mechanism relies on the TLAB allocation to fail so that the allocation is done in the slow path. So, I suppose, this will not be too dissimilar. Also I should have said "avoid adding extra checks in the fast path during the GC".
14-02-2014

Thomas, one more idea: Let's say we add the H regions to the CSet without doing anything else (i.e., not tagging the H objects specially). You then do the collection as normal. When the GC comes across an H object it will try to copy it. If you arrange that no PLAB is larger than the H object size limit, allocating space out of a PLAB to copy the H object will fail and go to the slow path. You then do a check there to see whether the object you're allocating is H or not and, if it is, just return a ref to itself so that it's self-forwarded. At the end of the GC, H objects that have been self-forwarded are live, the rest are dead. This could be a bit more fragile than what we've discussed so far, however it avoids adding extra checks during the GC.
14-02-2014

The current prototype simply adds a flag per region (actually only needed for H-regions) that indicates whether it has been referenced by the young gen evac. This is updated during evacuation, and at the end we walk all LOBs and if they have not been marked as referenced by the collection set, or its rset, or some other (conservative) conditions we can be sure it is not referenced any more. The prototype then simply reclaims these regions. I.e. implements method b) in a straightforward manner. This has been considered as the better alternative, as keeping object arrays (objArrayOops) in the young gen is very expensive, i.e. allows application of this technique for all arrays without any special-casing. Your idea looks good too, and saves that extra information in the table. I also think it is not required to scan these objects. Need to compare both approaches.
06-02-2014

The figure early_large_object_reclamation shows the impact of LOB reclamation in a prototype. It shows how many LOBs (H-regions, y-axis) were reclaimed after GC X (tracked only until "age" 63). E.g. at the x-value of 1 the number of LOBs that were reclaimed at the first GC after LOB allocation. The red line shows the default behavior on an application (not the worst one) where concurrent marking is basically running all the time, hence the peak of reclamations around 4-5 GCs. This also somewhat represents the best case, when marking is not concurrently running, the peak simply moves to the right. (median is 5) The blue line shows the number of reclamations with the prototype. It basically reclaims almost all LOBs (worth reclaiming) after a few GCs; median is 2. It also proves that in this application (and others) what we already know, that these large objects are very short lived. This relieves a lot of pressure from the GC.
06-02-2014

(We had discussed within the group some time ago.) FWIW, this should be a relatively easy fix. It will not be very expensive to add some / all of the humongous regions to the CSet every few GCs. The observation is that there's only one object per humongous region and, when you add an H region to the CSet, you tag the the object's forwarding ref field with a special value you can easily identify. During evacuation, you notice that an object has the special tag and you self-forward it (to indicate that it's been found to be live and to avoid subsequent visits to that object taking the special path). At the end of the GC you iterate over the H regions and check whether each H object has been forwarded or not: if it has it's live, if it hasn't it's dead and you can reclaim the H region(s) that contain it. In fact, I *think* that you won't even have to scan the H objects you'll find live; you can probably rely on the RSets to at least find H region -> young references. There's probably a number of "no H regions in the CSet" asserts in the code, those will have to be relaxed.
05-02-2014

Also in option b) the interaction with SATB must be considered
07-11-2013

Option a) should probably be done in context of pinning arbitrary regions as this feature would allow other uses.
07-11-2013