Filed on behalf of Ziyi Luo, email@example.com.
After every young-only GC, G1's Adative IHOP calculation involves the old generation allocation rate. This rate is calculated as: old_gen_allocated_bytes_since_last_gc / allocation_time_in_sec.
The value old_gen_allocated_bytes_since_last_gc is supposed to refer to all allocations into old regions, including humongous regions. This does not account for the possibility that humongous objects, at least primitive arrays, can be reclaimed by young collections (see JDK-8027959, G1ReclaimDeadHumongousObjectsAtYoungGC). This discrepancy can become problematic in applications which churn through a lot of rather short-lived humongous objects.
A real world example experienced in a production service is shown in Fig 1. The three vertical dash lines represent initial marking phases at the beginning of concurrent marking runs. Adaptive IHOP takes control after the third one of these. Here, most humongous objects are in fact collected during young GC, yet their allocations still count towards the old generation allocation rate in Adaptive IHOP, which erroneously reaches ~200 MB/s. In result, the estimated IHOP is pushed down.
Even though there is no immediate need for concurrent marking and mixed collections at this point, they now occur back-to-back. Further down this path, frequent inefficient young GC, high promotion rates, and high CPU usage ensue. By 24:00 the high old gen allocation rate is compounded by high promotion rates, and CPU usage jumps to above 90%.
There is a fix as shown in the webrev in the comments below that works as follows. In each young-only collection cycle, record these numbers of humongous regions:
A) present after the last GC,
B) newly allocated since the last GC,
C) present after this GC.
Estimate the number of humongous regions reclaimed by this GC as:
(A > C) ? B : A + B - C
Run the attached standalone program "AdaptiveIHOPIssueRepro.java" to approximate the allocation pattern that triggered this issue in our service. The necessary JVM options are listed in a code comment at the top.
Fig 2 shows a test results from 120 second runs with and without the proposed fix that we invoked like this:
java -Xmx512m -Xms512m -XX:G1HeapRegionSize=1m -XX:+UnlockExperimentalVMOptions -XX:G1MaxNewSizePercent=30 -XX:G1NewSizePercent=30 -Xlog:gc*=debug:file=gc-%p-%t.log AdaptiveIHOPIssueRepro 120
Each mark dot in Fig 2 represents a young GC invocation. There are 44 of these with the fix and 5254 without. The predicted old generation allocation rate of the fixed version is around 1/8 of the rate of the unfixed version.