Bug ID: JDK-8248783 G1: Trigger concurrent start if humongous allocation needs the reserved space to avoid evac failure

Type: Enhancement
Component: hotspot
Sub-Component: gc
Affected Version: 16

Priority: P4
Status: Closed
Resolution: Duplicate

Submitted: 2020-07-03
Updated: 2021-09-01
Resolved: 2021-09-01

Other
tbdResolved

> Created on behalf of Liang Mao (maoliang.ml@alibaba-inc.com)

This bug id is to propose new humongous objects allocation share the young space with eden allocation. 
Allocation in eden space and humongous allocation will share the free regions. The GC will be triggered when free_regions == reserve_regions. Initial mark will be triggered immediately if humongous allocation already invaded into _young_list_target_length.

It could resolve following 2 issues:

1) New allocated humongous objects could reach IHOP very quickly and lead to very frequent GCs. 
If new allocated humongous objects could use young space, the GC frequency will significantly drop.

2) The humongous allocation could use regions very quickly. G1ReservePercent is used to be the allocation buffer to avoid to-space exhausted by humongous allocation or GC to-space copy and an initial mark will be triggered immediately if humongous occupancy beyond IHOP. But in the concurrent marking cycle or prepare to do mixed GCs, there're no any GC actions ahead to avoid completely full of heap. Triggering GC when there're not enough free regions(free_regions <= reserve_regions) can easily avoid such disasters.


There could be more initial-marks after the change. So JDK-8240556 is a necessity to avoid uncessary concurrent marks.

JDK-8257774 introduces additional "preventive" minor garbage collections similar to what the results of this change would have been. So I am closing this issue as a duplicate of the other. If there are still use cases that this does not cover, please open a new issue. A version of the provided test case which has the mentioned C2 optimization circumvented does not do full collections at least.
01-09-2021
I found out why later jdk versions do not show the issue: C2 will detect that the array allocation is not used and optimize it away.
06-08-2020
Some initial observations about the test and the problem (note that I am not sure you are observing the same in your production application): - with jdk-11.0.7+10: g1: hundreds of full gcs; with cms: 13 full gcs - with jdk15-ea-33/jdk16: g1: 2 full gcs at the start, staying that way even if increasing the length of the allocateGarbage() loop 10 times. - You can tune out the issue even with g1 by giving the application more marking threads to complete the first marking round earlier. It looks like the allocation rate is just too high in the beginning. - Given that before the full gcs there are many evacuation failures (to-space exhausted), and not just one before the full gc, I wonder whether the actual problem isn't handling of evacuation failure. I.e. dumping all young regions into old, even extremely little occupied ones, without a way to reclaim them quickly (jdk16-baseline.log) It may be more useful to gather remembered sets for these and try to clean them up asap, or have the regions not immediately made old (i.e. keep old as old, but eden/survivor could be kept in young, depending on how much contents it has?). - on jdk16 the test completes without full gc with JDK-8240556 alone, not even coming close to an evacuation failure. So this change seems to be unnecessary for jdk16 (jdk16-with-JDK-8240556-only.log)
24-07-2020
> Commented on behalf of Liang Mao (maoliang.ml@alibaba-inc.com) java -Xmx1g -Xms1g -XX:+UseG1GC -XX:ParallelGCThreads=4 -XX:ConcGCThreads=1 -Xlog:gc* -XX:G1HeapRegionSize=1m HumongousAllocationTest will easily cause Full GC while using -XX:G1HeapRegionSize=2m would resolve the problem. import java.util.TreeMap; import java.util.Map; import java.util.LinkedList; class HumongousAllocationTest { private static Map<String, String> longLiveObjectsMap = new TreeMap<>(); private static LinkedList<Object> oldGarbageList = new LinkedList<Object>(); private static void allocateLiveObjects() { for (int i = 0; i < 2000000; i++) { longLiveObjectsMap.put("key" + i, "value" + i); } } private static void allocateGarbage() { for (int i = 0; i < 1000000; i++) { byte[] array = new byte[512 * 1024]; oldGarbageList.add(new byte[10 * 1024]); for (int j = 0; j < 50; j++) { array = new byte[10 * 1024]; } if (oldGarbageList.size() > 5000) { oldGarbageList.removeFirst(); } } } public static void main(String[] args) { allocateLiveObjects(); allocateGarbage(); } }
10-07-2020
RFR: https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-July/030295.html
07-07-2020
Hi [~ddong], Liang! I am a bit confused by the given description as it does not seem to make clear what the problem is and how it should be solved (or solved differently than eg. JDK-8240556). So I will keep this CR untriaged for now. I will try to comment on the description sentence by sentence. "New allocated humongous objects could reach IHOP very quickly and lead to very frequent GCs." Okay, this is a well known fact. "In the current implementation, the humongous allocation could invade the young space which is determined by last young pause." The first part states that humongous allocation could use up the reserve area (by G1ReservePercent?). This is true, and intentional, that part of the heap is reserved exactly for this purpose. Not sure what the second part of the sentence means, because the reserve is determined statically by G1ReservePercent. "The invasion into young space and reserve space by humongous objects allocation will cause long time pause because of to-space exhausted." As mentioned, this is completely normal that humongous objects use the reserve, and that statement may or may not be true. "But more triggerings of initial-mark needs JDK-8240556 to avoid uncessary concurrent marks." I do not follow this one, because the description does not state what should be done as part of the change. Also, there are quite a few reasons why g1 goes into an evac failure, some of them may just be misconfiguration. E.g. the default heap reserve may be too small as the possible old gen allocation during marking exceeds it, or the (minimum) young gen and old gen are already too large, reaching into the reserve already, or the IHOP threshold is not accurate enough (maybe too bursty load) and others. It would be really nice to have some logs to understand what the problem actually is. Also as far as I understand, the CR does not tell what should be done here, although it looks like you already know or even have a solution as you refer to JDK-8240556. It is okay to just say, "find solutions for this", but at least at the review there will be the question about an exact description of the situation. So please elaborate. :) Thanks.
03-07-2020

Blocks :	JDK-8240556 - Abort concurrent mark after effective eager reclamation of humongous objects
Duplicate :	JDK-8257774 - G1: Trigger collect when free region count drops below threshold to prevent evacuation failures
Relates :	JDK-8251288 - G1: Young gen sizing should take short-living humongous object allocation into account