JDK-8347353 : [Shenandoah] guarantee(stats.non_trashed_span() <= generation_capacity) failed: Before Mark: generation (Young) size spanned by regions (1244) * region size (256K) must not exceed current capacity
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 25
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • CPU: x86_64,aarch64
  • Submitted: 2025-01-09
  • Updated: 2025-05-29
Related Reports
Relates :  
Description
The jtreg test
gc/stress/gcold/TestGCOldWithShenandoah.java#generational
 
triggers sometimes this issue :
 
#  Internal Error (d:\priv\jenkins\client-home\workspace\openjdk-jdk-dev-windows_x86_64-dbg\jdk\src\hotspot\share\gc\shenandoah\shenandoahVerifier.cpp:454), pid=35284, tid=18196
#  guarantee(stats.non_trashed_span() <= generation_capacity) failed: Before Mark: generation (Young) size spanned by regions (1244) * region size (256K) must not exceed current capacity (308M)
 
Stack: [0x000000a40aa00000,0x000000a40ab00000],  sp=0x000000a40aafeeb0,  free space=1019k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [jvm.dll+0xfbe93f]  ShenandoahGenerationStatsClosure::validate_usage+0x1cf  (shenandoahVerifier.cpp:450)
V  [jvm.dll+0xfbf47f]  ShenandoahVerifier::verify_at_safepoint+0x54f  (shenandoahVerifier.cpp:935)
V  [jvm.dll+0xfbfcdd]  ShenandoahVerifier::verify_before_concmark+0x4d  (shenandoahVerifier.cpp:1070)
V  [jvm.dll+0xec352c]  ShenandoahConcurrentGC::op_init_mark+0x23c  (shenandoahConcurrentGC.cpp:649)
V  [jvm.dll+0xec1c93]  ShenandoahConcurrentGC::entry_init_mark+0x133  (shenandoahConcurrentGC.cpp:303)
V  [jvm.dll+0xfaf39a]  VM_ShenandoahInitMark::doit+0x3a  (shenandoahVMOperations.cpp:85)
V  [jvm.dll+0x1199844]  VM_Operation::evaluate+0xe4  (vmOperations.cpp:76)
V  [jvm.dll+0x119b578]  VMThread::evaluate_operation+0xb8  (vmThread.cpp:284)
V  [jvm.dll+0x119bc9f]  VMThread::inner_execute+0x24f  (vmThread.cpp:430)
V  [jvm.dll+0x119c2c4]  VMThread::run+0x114  (vmThread.cpp:177)
V  [jvm.dll+0x10ee476]  Thread::call_run+0x1a6  (thread.cpp:237)
V  [jvm.dll+0xd86921]  thread_native_entry+0xe1  (os_windows.cpp:545)
C  [ucrtbase.dll+0x1fb80]  (no source info available)
C  [KERNEL32.DLL+0x84d4]  (no source info available)
C  [ntdll.dll+0x51a11]  (no source info available)

Comments
The problem occurs during at the start of the first concurrent young GC following completion of old marking. In the existing implementation, here is what happens at the end of old marking: 1. We rebuild the freeset to account for any immediate garbage regions found by old marking 2. Then, we balance the generations in preparation for the next GC cycle. Balancing generations transfers heap regions between young and old generations. The bug is that when we transfer regions as part of generation balancing, we fail to make adjustments to the freeset partitions. In one of our rr recordings of this problem, balance generations caused 66 regions to be transferred from young to old. This was motivated by the expectation that we have newly identified candidate regions for mixed evacuation and we will need some space in the old generation into which these mixed evacuation candidates will be evacuated. What should also happen is that we modify the Shenandoah free set to decrease the capacity of the Mutator partition and increase the capacity of the OldCollector partition. Likewise, we should adjust the computed values of available memory in each partition. And we should change the membership of 66 empty regions that currently reside within the Mutator partition and place these into the OldCollector partition. We are not doing any of this. One possible way to resolve this is to perform all of the missing steps outlined above. Another way to resolve this is to rebalance generations before we finish rebuilding the freeset so that the freeset is built once, the right way, and there's no need to change it. We are testing an implementation of this second approach currently.
29-05-2025

Just an update: we have been able to reproduce and record this and we are actively working on a fix now.
28-05-2025

Thanks for your help with this. This is what I've been running on my macos. So far, have got through over 1000 tests without a fail. More machines doing the same might help us find the issue: `` nohup make test TEST="gc/stress/gcold/TestGCOldWithShenandoah.java#generational" TEST_VM_OPTS="-Xlog:gc*=info" JTREG="REPEAT_COUNT=1000" CONF=fastdebug >test-failure.out 2>test-failure.err ``` We've got other machines here also running the test similar numbers of times...
22-05-2025

> I wonder if you could reproduce with a GC log and attach log along with hs_err file to this ticket. Hi [~kdnilsen] the issue was always triggered by jtreg test gc/stress/gcold/TestGCOldWithShenandoah.java#generational , once on macOS, a few times on Windows x86_64 but it is not so easily reproducible. If you provide a little patch of this test with the changed GC setting you want, we can add this patch to our build/test queue and in case we hit the issue once again, deliver the new hserr+gc-log.
21-05-2025

We're still having difficulty reproducing on macos aarch64 and linux aarch64. We don't have "convenient access" to windows_x86. I wonder if you could reproduce with a GC log and attach log along with hs_err file to this ticket.
20-05-2025

Also seen on MacOS aarch64: guarantee(stats.non_trashed_span() <= generation_capacity) failed: Before Mark: generation (Young) size spanned by regions (1231) * region size (256K) must not exceed current capacity (307M) Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.dylib+0x13b713c] VMError::report(outputStream*, bool)+0x1b00 (shenandoahVerifier.cpp:454) V [libjvm.dylib+0x13ba9ec] VMError::report_and_die(int, char const*, char const*, char*, Thread*, unsigned char*, void const*, void const*, char const*, int, unsigned long)+0x55c V [libjvm.dylib+0x5bbc44] print_error_for_unit_test(char const*, char const*, char*)+0x0 V [libjvm.dylib+0x11aff88] ShenandoahGenerationStatsClosure::validate_usage(bool, char const*, ShenandoahGeneration*, ShenandoahCalculateRegionStatsClosure&)+0x374 V [libjvm.dylib+0x11aefc4] ShenandoahVerifier::verify_at_safepoint(char const*, ShenandoahVerifier::VerifyRememberedSet, ShenandoahVerifier::VerifyForwarded, ShenandoahVerifier::VerifyMarked, ShenandoahVerifier::VerifyCollectionSet, ShenandoahVerifier::VerifyLiveness, ShenandoahVerifier::VerifyRegions, ShenandoahVerifier::VerifySize, ShenandoahVerifier::VerifyGCState)+0xf7c V [libjvm.dylib+0x11b01b8] ShenandoahVerifier::verify_before_concmark()+0xa0 V [libjvm.dylib+0x10667e8] ShenandoahConcurrentGC::op_init_mark()+0x218 V [libjvm.dylib+0x10663b0] ShenandoahConcurrentGC::entry_init_mark()+0x120 V [libjvm.dylib+0x11ad914] VM_ShenandoahInitMark::doit()+0x54 V [libjvm.dylib+0x13c24e8] VM_Operation::evaluate()+0x11c V [libjvm.dylib+0x13e078c] VMThread::evaluate_operation(VM_Operation*)+0x108 V [libjvm.dylib+0x13e12e4] VMThread::inner_execute(VM_Operation*)+0x320 V [libjvm.dylib+0x13e0474] VMThread::loop()+0x98 V [libjvm.dylib+0x13e0200] VMThread::run()+0xc0 V [libjvm.dylib+0x12febe0] Thread::call_run()+0xf0 V [libjvm.dylib+0xedd388] thread_native_entry(Thread*)+0x138 C [libsystem_pthread.dylib+0x6f94] _pthread_start+0x88 VM_Operation (0x000000016c93ac30): Shenandoah Init Marking, mode: safepoint, requested by thread 0x000000011a00dc10
19-05-2025