JDK-8359348 : G1: Improve cpu usage measurements for heap sizing
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2025-06-12
  • Updated: 2025-09-03
  • Resolved: 2025-08-27
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 26
26 b13Fixed
Related Reports
Blocks :  
Causes :  
Relates :  
Description
Currently G1 only uses pause times as a metric to calculate the gc cpu usage.

This disregards concurrent cpu usage completely.

Make the calculation more accurate; this might require retuning of `GCTimeRatio`.

Some platforms may not support more exact CPU measurements so the current mechanism might need to be kept.
Comments
[~mbaesken] can you send the full hs_err_pid*.log to help with looking into this. I have created a new bug report https://bugs.openjdk.org/browse/JDK-8366328.
28-08-2025

Other issue observed on Linux Alpine (triggered by test compiler/whitebox/AllocationCodeBlobTest) : # Internal Error (/priv/jenkins/client-home/workspace/openjdk-jdk-linux_alpine_x86_64-dbg/jdk/src/hotspot/os/linux/os_linux.cpp:4203), pid=8386, tid=8410 # assert(status == 0) failed: clock_gettime error: Invalid argument Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x18017fe] os::Linux::fast_thread_cpu_time(int) [clone .part.0]+0xe (os_linux.cpp:4203) V [libjvm.so+0x1806f7a] (os_linux.cpp:4951) V [libjvm.so+0xc66ce4] CPUTimeThreadClosure::do_thread(Thread*)+0x14 (cpuTimeUsage.cpp:38) V [libjvm.so+0xe52659] G1CollectedHeap::gc_threads_do(ThreadClosure*) const+0x29 (g1CollectedHeap.cpp:2265) V [libjvm.so+0xc66bd2] CPUTimeUsage::GC::gc_threads()+0x32 (cpuTimeUsage.cpp:63) V [libjvm.so+0xe20189] G1Analytics::gc_cpu_time_ms() const+0x9 (g1Analytics.cpp:156) V [libjvm.so+0xf378a7] G1Policy::record_full_collection_start()+0x37 (g1Policy.cpp:669) V [libjvm.so+0xeab780] G1FullCollector::prepare_collection()+0x20 (g1FullCollector.cpp:181) V [libjvm.so+0xe56a76] G1CollectedHeap::do_full_collection(bool, bool, unsigned long)+0x3c6 (g1CollectedHeap.cpp:857) V [libjvm.so+0xf7c890] VM_G1CollectFull::doit()+0x50 (g1VMOperations.cpp:55) V [libjvm.so+0x1e65ec6] VM_Operation::evaluate()+0x196 (vmOperations.cpp:74) V [libjvm.so+0x1e8291b] VMThread::evaluate_operation(VM_Operation*)+0x5eb (vmThread.cpp:284) V [libjvm.so+0x1e833bf] VMThread::inner_execute(VM_Operation*)+0x42f (vmThread.cpp:421) V [libjvm.so+0x1e83554] VMThread::loop()+0x84 (vmThread.cpp:487) V [libjvm.so+0x1e83664] VMThread::run()+0x94 (vmThread.cpp:177) V [libjvm.so+0x1d6a066] Thread::call_run()+0xb6 (thread.cpp:243) V [libjvm.so+0x1808a6c] thread_native_entry(Thread*)+0x18c (os_linux.cpp:868)
28-08-2025

We now observe this crash on Linux Alpine, triggered by runtime/Shutdown/ShutdownTest.java ; might be related to this change ? Test compiler/startup/StartupOutput seems to trigger this too. # # SIGSEGV (0xb) at pc=0x00007f0824ce4b18, pid=16398, tid=16412 # # C [ld-musl-x86_64.so.1+0x66b18] pthread_getcpuclockid+0x0 Stack: [0x00007f080e760000,0x00007f080e860ab0], sp=0x00007f080e85fbb8, free space=1022k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [ld-musl-x86_64.so.1+0x66b18] pthread_getcpuclockid+0x0 V [libjvm.so+0xc66ce4] CPUTimeThreadClosure::do_thread(Thread*)+0x14 (cpuTimeUsage.cpp:38) V [libjvm.so+0xe9f855] G1ConcurrentRefineThreadControl::worker_threads_do(ThreadClosure*)+0x55 (g1ConcurrentRefine.cpp:125) V [libjvm.so+0xe52679] G1CollectedHeap::gc_threads_do(ThreadClosure*) const+0x49 (g1CollectedHeap.cpp:2267) V [libjvm.so+0xc66bd2] CPUTimeUsage::GC::gc_threads()+0x32 (cpuTimeUsage.cpp:63) V [libjvm.so+0xe20189] G1Analytics::gc_cpu_time_ms() const+0x9 (g1Analytics.cpp:156) V [libjvm.so+0xf37997] G1Policy::record_pause_start_time()+0x37 (g1Policy.cpp:669) V [libjvm.so+0xe7f53f] G1ConcurrentMark::remark()+0x7f (g1ConcurrentMark.cpp:1382) V [libjvm.so+0xf7ccc9] VM_G1PauseConcurrent::doit()+0x279 (g1VMOperations.cpp:154) V [libjvm.so+0x1e65ec6] VM_Operation::evaluate()+0x196 (vmOperations.cpp:74) V [libjvm.so+0x1e8291b] VMThread::evaluate_operation(VM_Operation*)+0x5eb (vmThread.cpp:284) V [libjvm.so+0x1e833bf] VMThread::inner_execute(VM_Operation*)+0x42f (vmThread.cpp:421) V [libjvm.so+0x1e83554] VMThread::loop()+0x84 (vmThread.cpp:487) V [libjvm.so+0x1e83664] VMThread::run()+0x94 (vmThread.cpp:177) V [libjvm.so+0x1d6a066] Thread::call_run()+0xb6 (thread.cpp:243) V [libjvm.so+0x1808a6c] thread_native_entry(Thread*)+0x18c (os_linux.cpp:868) siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x00007f080ecf8b68
28-08-2025

We see now observe asserts on AIX, triggered by runtime/Shutdown/ShutdownTest.java : # # Internal Error (/priv/jenkins/client-home/workspace/openjdk-jdk-aix_ppc64-dbg/jdk/src/hotspot/os/aix/os_aix.cpp:2393), pid=18416088, tid=2057 # assert(n >= 0) failed: negative CPU time # Stack: [0x0000000114230000,0x000000011444c888], sp=0x000000011444b040, free space=2156k No context given, using current context. Native frame: iar: 0x090000000c16ef8c libjvm.so::AixNativeCallstack::print_callstack_for_context(outputStream*, ucontext_t const*, bool, char*, unsigned long)+0x4ec (C++ uses_alloca saves_cr saves_lr stores_bc gpr_saved:18 fixedparms:5 parmsonstk:1) lr: 0x000000011444a270 (unknown module)::(unknown function)+? sp: 0x0000000114449fe0 (base - 0x28A8) rtoc: 0x08001000a045e2f8 |---stackaddr----| |----lrsave------|: <function name> 0x000000011444a3d0 - 0x090000000c16ea24 libjvm.so::os::Aix::platform_print_native_stack(outputStream*, void const*, char*, int, unsigned char*&)+0x24 (C++ uses_alloca saves_lr stores_bc gpr_saved:1 fixedparms:5 parmsonstk:1) 0x000000011444a450 - 0x090000000c16e8d8 libjvm.so::NativeStackPrinter::print_stack(outputStream*, char*, int, unsigned char*&, bool, int)+0x58 (C++ fp_present uses_alloca saves_cr saves_lr stores_bc gpr_saved:6 fixedparms:7 parmsonstk:1) 0x000000011444a560 - 0x090000000c900a58 libjvm.so::VMError::report(outputStream*, bool)+0x23b8 (C++ fp_present uses_alloca saves_cr saves_lr stores_bc gpr_saved:18 fixedparms:2 parmsonstk:1) 0x000000011444b050 - 0x090000000bedc030 libjvm.so::VMError::report_and_die(int, char const*, char const*, char*, Thread*, unsigned char*, void const*, void const*, char const*, int, unsigned long)+0x870 (C++ uses_alloca saves_lr stores_bc gpr_saved:18 fixedparms:8 parmsonstk:1) 0x000000011444b230 - 0x090000000bedb4e0 libjvm.so::report_vm_error(char const*, int, char const*, char const*, ...)+0xa0 (C++ uses_alloca saves_lr stores_bc gpr_saved:5 fixedparms:4 parmsonstk:1) 0x000000011444b2f0 - 0x090000000befbae8 libjvm.so::os::thread_cpu_time(Thread*)+0x68 (C++ uses_alloca saves_lr stores_bc gpr_saved:1 fixedparms:1 parmsonstk:1) 0x000000011444b370 - 0x090000000c13a1e4 libjvm.so::CPUTimeThreadClosure::do_thread(Thread*)+0x24 (C++ uses_alloca saves_lr stores_bc gpr_saved:2 fixedparms:2 parmsonstk:1) 0x000000011444b3f0 - 0x090000000c693878 libjvm.so::G1CollectedHeap::gc_threads_do(ThreadClosure*) const+0x58 (C++ uses_alloca saves_lr stores_bc gpr_saved:3 fixedparms:2 parmsonstk:1) 0x000000011444b480 - 0x090000000c13a15c libjvm.so::CPUTimeUsage::GC::gc_threads()+0x5c (C++ uses_alloca saves_lr stores_bc gpr_saved:1 parmsonstk:1) 0x000000011444b510 - 0x090000000c3b11f8 libjvm.so::G1Analytics::gc_cpu_time_ms() const+0x18 (C++ fp_present uses_alloca saves_lr stores_bc gpr_saved:1 fixedparms:1 parmsonstk:1) 0x000000011444b590 - 0x090000000c3b0efc libjvm.so::G1Policy::record_young_collection_start()+0x5c (C++ fp_present uses_alloca saves_lr stores_bc fpr_saved:1 gpr_saved:6 fixedparms:1 parmsonstk:1) 0x000000011444b650 - 0x090000000c3a3168 libjvm.so::G1YoungCollector::collect()+0x268 (C++ fp_present uses_alloca saves_lr stores_bc gpr_saved:10 fixedparms:1 parmsonstk:1) 0x000000011444bb50 - 0x090000000c39d1f4 libjvm.so::G1CollectedHeap::do_collection_pause_at_safepoint_helper(unsigned long)+0xf4 (C++ uses_alloca saves_lr stores_bc gpr_saved:7 fixedparms:2 parmsonstk:1) 0x000000011444bcf0 - 0x090000000c39d010 libjvm.so::G1CollectedHeap::do_collection_pause_at_safepoint(unsigned long)+0xb0 (C++ uses_alloca saves_lr stores_bc gpr_saved:3 fixedparms:2 parmsonstk:1) 0x000000011444bd80 - 0x090000000c604544 libjvm.so::VM_G1CollectForAllocation::doit()+0xa4 (C++ uses_alloca saves_lr stores_bc gpr_saved:5 fixedparms:1 parmsonstk:1) 0x000000011444be20 - 0x090000000bfdfdc0 libjvm.so::VM_Operation::evaluate()+0x160 (C++ uses_alloca saves_lr stores_bc gpr_saved:6 fixedparms:1 parmsonstk:1) 0x000000011444bfb0 - 0x090000000bfdf784 libjvm.so::VMThread::evaluate_operation(VM_Operation*)+0x184 (C++ fp_present uses_alloca saves_cr saves_lr stores_bc gpr_saved:11 fixedparms:2 parmsonstk:1) 0x000000011444c130 - 0x090000000bfdf08c libjvm.so::VMThread::inner_execute(VM_Operation*)+0x42c (C++ uses_alloca saves_cr saves_lr stores_bc gpr_saved:11 fixedparms:2 parmsonstk:1) 0x000000011444c4e0 - 0x090000000d1d4818 libjvm.so::VMThread::loop()+0xd8 (C++ uses_alloca saves_lr stores_bc gpr_saved:5 fixedparms:1 parmsonstk:1) 0x000000011444c580 - 0x090000000d1d44c8 libjvm.so::VMThread::run()+0x108 (C++ uses_alloca saves_lr stores_bc gpr_saved:5 fixedparms:1 parmsonstk:1) 0x000000011444c650 - 0x090000000bfc7148 libjvm.so::Thread::call_run()+0x128 (C++ uses_alloca saves_lr stores_bc gpr_saved:3 fixedparms:1 parmsonstk:1) 0x000000011444c6e0 - 0x090000000bfc640c libjvm.so::thread_native_entry(Thread*)+0x20c (C++ uses_alloca saves_lr stores_bc gpr_saved:8 fixedparms:1 parmsonstk:1) 0x000000011444c7a0 - 0x090000000056204c libpthreads.a::_pthread_body+0xec (C saves_lr stores_bc gpr_saved:1 fixedparms:1 ) 0x000000011444c820 - 0x0000000000000000 This might be related to your change.
28-08-2025

Changeset: 124575b4 Branch: master Author: Ivan Walulya <iwalulya@openjdk.org> Date: 2025-08-27 11:45:43 +0000 URL: https://git.openjdk.org/jdk/commit/124575b4c2b52328a8efddb40e67057a53b44a04
27-08-2025

I uploaded the synthetic simulator at https://github.com/openjdk/jdk/compare/master...caoman:jdk:heapsim. I could try publishing it in its own repo later, but need to go through additional approval from our side. Any suggestion if OpenJDK has any repo or directory that I could contribute this program? It is helpful to demonstrate other issues with G1 heap resizing. Regarding to the fluctuating mutator CPU usage issue, upon reviewing https://github.com/openjdk/jdk/pull/26351, I'm convinced that the proposed approach does not suffer from this issue. This is because the proposed approach does not rely on the total process CPU usage, and does not measure "GC CPU overhead", i.e., a ratio of GC CPU usage over total CPU usage.
23-07-2025

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/26351 Date: 2025-07-16 13:55:32 +0000
17-07-2025

[~manc] can you help with creating the synthetic simulator, or even better test with your workloads used when testing SoftMaxHeapSize.
16-07-2025

We have some concerns about using GC CPU overhead to control heap sizing: https://github.com/openjdk/jdk/pull/24211#issuecomment-2777769994. > Somewhat related to above, our experience with our internal algorithm that adjusts SoftMaxHeapSize based on GC CPU overhead, encountered cases that it behaves poorly. The problem is that some workload have large variance in mutator's CPU usage (e.g. peak hours vs off-peak hours), but smaller variance in GC CPU usage. Then it does not make much sense to maintain a constant % for GC CPU overhead, which could cause excessive heap expansion when mutator CPU usage is low. The workaround is to take live size into consideration when calculating SoftMaxHeapSize, which is similar to how Min/MaxHeapFreeRatio works. I suggest we test the PR with a workload that has fluctuating mutator CPU usage, but relatively stable eden allocation rate and live bytes. Then heap size could fluctuate too much. It might be easier to write a synthetic simulator that exhibits this behavior.
16-07-2025