Bug ID: JDK-8358342 Extremely long young G1 evacuation pauses starting JDK 21.0.4

JDK-8358342 : Extremely long young G1 evacuation pauses starting JDK 21.0.4

Type: Bug
Component: hotspot
Sub-Component: gc
Affected Version: 21.0.4,22

Priority: P4
Status: Open
Resolution: Unresolved

Submitted: 2025-06-02
Updated: 2025-06-10

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 26
26Unresolved

Related Reports

Causes :	JDK-8315503 - G1: Code root scan causes long GC pauses due to imbalanced iteration
Relates :	JDK-8296803 - Reconsider grow hint defaults for G1 card set hash table

Description

Developers reported their application experienced extremely long young GC evacuation pauses sometimes after the application running for about 5m~30m, starting from JDK 21.0.4 (last year). When occurred, the long pause appeared to be associated with a spike in GCU usage. In some extreme cases, the long evacuation pauses could reach >90s.

The extremely long young G1 evacuation pauses appeared to be associated with specific usages of the application, which made it difficult to reproduce. I investigated the issue with 'gc+phases=debug' from some test runs with reproduced long evacuation pauses (but magnitude lower, <1s). According to the GC logs, the long pauses were due to long Code Root Scan operations. That connected the issue to JDK-8315503, which was back-ported to JDK 21.0.4. I was not able to reproduce the extreme long young G1 evacuation with the CCStress.java from JDK-8315503. I experimented with increasing the number generated classes with CCStress.java without being able to reproduce.

JDK-8315503 switched to use ConcurrentHashTable to store code root. The initial table size was 2^2 (set with 'Log2DefaultNumBuckets = 2'), which was small. I added a command-line option, which was used for the developers to run with a larger initial hashtable for the code root. The idea was to avoid the operations for growing the table and copying the table entries. With the initial table size set to 2^15, the developers reported the extreme long young G1 evacuation no longer occurred in their non-testing runs.

Using a very large code root hash-table increased memory usages since the code root table was per region. The memory overhead became problematic when very large Java heap was used.

Reporting the issue for more thoughts. I/we was not able to construct a specific test case to demonstrate the issue (sorry about that part).

Comments

Current logs with only type of collection/region info (current eden/survivor/old/humongous)/"Scanned NMethods" would already give some useful statistics to better understand the problem.
10-06-2025
Hi [~tschatzl], I'm checking with the developers to see if they are okay with sharing the detailed GC logs. I didn't capture/record the logs for GCs with non-problematic pauses, so I'm also asking the developers if they still retain logs from the old runs to get additional info. (The application has been running with increased initial code root table size for several months.) I'll update you on the bug once I hear back from them. Thanks!
04-06-2025
Hi [~jiangli]! thanks for the report, interesting. I have got lots of questions: * Since JDK-8315605 is also in that build, what does `Scanned Nmethods` show in these cases compared to other GCs? * Can you provide a log with at least one of these problematic GCs and a few similar that do not show the problem with `gc+phases=debug`? One problem could be malloc() stalling (it tends to be not scalable) - the nmethods of surviving objects are ultimately copied over to new regions, and if there are many (and there are many threads), malloc'ing the subsequently growing result code root sets could stall, and would explain what you can see. In the worst case, potentially instrumenting the add() to the code root set might show the issue as this being the largest contributor. * In the CHT add(), the use of a single scoped critical section may also be problematic - if I remember correctly, the global GlobalCounter (i.e. a single critical section object) is used for all CHTs, so maybe the application is overloading that one. * Another much less likely option could be what JDK-8316212 refers to, that making less calls to the CHT could improve the situation. * Also, is there some particular platform you are seeing this, like aarch64 or only x64? (Or tried on only one) * Does this reproduce on later JDKs? (Probably, but just to be sure) * In the worst case, replacing malloc with a manual allocator could be an option like we do for the remembered sets. That would not be my first choice though. Without statistics and measurements this is hard to gauge what the best option is. * One could also experiment with the chain length of the buckets. Thanks, Thomas
03-06-2025
> Using a very large code root hash-table increased memory usages since the code root table was per region. The memory overhead became problematic when very large Java heap was used. One can make the next base current code root set size some statistical value of the existing ones for a particular set of regions (e.g. per type/generation) to right size it, with some backpressure (e.g. from that value, always take half, or just round down) to reduce/minimize that issue. Getting this statistics for your application per GC/per region might allow simulating various heuristics. (And it would probably be nice to have for some log messages too, like for remembered sets)
03-06-2025