JDK-8345399 : GenShen: Error: Verify init-mark remembered set violation; clean card should be dirty
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 24
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2024-12-03
  • Updated: 2025-04-30
  • Resolved: 2025-03-31
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 25
25 b17Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
Generational Shenandoah has crashed with "Error: Verify init-mark remembered set violation; clean card should be dirty" while running gc/TestAllocHumongousFragment.java#generational (see attached hs_err file).
Comments
A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/shenandoah-jdk21u/pull/188 Date: 2025-04-30 15:43:20 +0000
30-04-2025

The bug causing JDK-8353691 could also potentially cause rem-set violation.
03-04-2025

Changeset: 4d1de46c Branch: master Author: Xiaolong Peng <xpeng@openjdk.org> Date: 2025-03-31 18:13:31 +0000 URL: https://git.openjdk.org/jdk/commit/4d1de46cb882bade5781255b646f623b55d7180f
31-03-2025

I have reproduced the remembered set validation error on ppc64le hardware with TIP: ``` [13.990s][info][gc,start ] GC(101) Pause Full [13.990s][info][gc,task ] GC(101) Using 4 of 4 workers for full gc [13.990s][info][gc,start ] GC(101) Verify Before Full GC, Level 4 [13.998s][info][gc ] GC(101) Verify Before Full GC, Level 4 (22772 reachable, 0 marked) [13.998s][info][gc,phases,start] GC(101) Phase 1: Mark live objects [14.003s][info][gc,ref ] GC(101) Clearing All SoftReferences [14.003s][info][gc,ref ] GC(101) Clearing All SoftReferences [14.009s][info][gc,ref ] GC(101) Encountered references: Soft: 49, Weak: 101, Final: 0, Phantom: 8 [14.009s][info][gc,ref ] GC(101) Discovered references: Soft: 31, Weak: 39, Final: 0, Phantom: 8 [14.009s][info][gc,ref ] GC(101) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 [14.012s][info][gc,phases ] GC(101) Phase 1: Mark live objects 13.674ms [14.012s][info][gc,phases,start] GC(101) Phase 2: Compute new object addresses [14.026s][info][gc,phases ] GC(101) Phase 2: Compute new object addresses 14.166ms [14.026s][info][gc,phases,start] GC(101) Phase 3: Adjust pointers [14.030s][info][gc,phases ] GC(101) Phase 3: Adjust pointers 3.626ms [14.030s][info][gc,phases,start] GC(101) Phase 4: Move objects [14.128s][info][gc,phases ] GC(101) Phase 4: Move objects 98.264ms [14.128s][info][gc,phases,start] GC(101) Phase 5: Full GC epilog [14.146s][info][gc,ergo ] GC(101) Transfer 234 region(s) from Old to Young, yielding increased size: 790M [14.146s][info][gc,ergo ] GC(101) FullGC done: young usage: 450M, old usage: 231M [14.146s][info][gc,free ] Free: 296M, Max: 512K regular, 296M humongous, Frag: 0% external, 0% internal; Used: 0B, Mutator Free: 592 Collector Reserve: 40959K, Max: 512K; Used: 16B Old Collector Reserve: 1307K, Max: 511K; Used: 740K [14.146s][info][gc,ergo ] GC(101) After Full GC, successfully transferred 0 regions to none to prepare for next gc, old available: 1307K, young_available: 296M [14.146s][info][gc,barrier ] GC(101) Cleaned read_table from 0x0000754a50290000 to 0x0000754a5048ffff [14.146s][info][gc,barrier ] GC(101) Current write_card_table: 0x0000754a4fc90000 [14.148s][info][gc,phases ] GC(101) Phase 5: Full GC epilog 20.265ms [14.148s][info][gc,start ] GC(101) Verify After Full GC, Level 4 [14.182s][info][gc ] GC(101) Verify After Full GC, Level 4 (22664 reachable, 125 marked) [14.182s][info][gc,ergo ] GC(101) At end of Full GC: GCU: 6.9%, MU: 9.9% during period of 0.261s [14.182s][info][gc,ergo ] GC(101) At end of Full GC: Young generation used: 450M, used regions: 454M, humongous waste: 3532K, soft capacity: 1024M, max capacity: 790M, available: 296M [14.182s][info][gc,ergo ] GC(101) At end of Full GC: Old generation used: 231M, used regions: 234M, humongous waste: 1654K, soft capacity: 0B, max capacity: 234M, available: 1307K [14.182s][info][gc,ergo ] GC(101) Good progress for free space: 296M, need 10485K [14.182s][info][gc,ergo ] GC(101) Good progress for used space: 148M, need 512K [14.182s][info][gc ] GC(101) Pause Full 829M->681M(1024M) 192.311ms ... [14.196s][info][gc ] Trigger (Young): Free (65536K) is below minimum threshold (80895K) [14.196s][info][gc,free ] Free: 65536K, Max: 512K regular, 65536K humongous, Frag: 0% external, 0% internal; Used: 0B, Mutator Free: 128 Collector Reserve: 40959K, Max: 512K; Used: 16B Old Collector Reserve: 1307K, Max: 511K; Used: 740K [14.196s][info][gc,ergo ] GC(102) Start GC cycle (Young) [14.196s][info][gc,start ] GC(102) Concurrent reset (Young) [14.196s][info][gc,task ] GC(102) Using 2 of 4 workers for Concurrent reset (Young) [14.196s][info][gc,ergo ] GC(102) Pacer for Reset. Non-Taxable: 1024M Allocated: 732 Mb Allocated: 699 Mb Allocated: 715 Mb [14.200s][info][gc,thread ] Cancelling GC: unknown GCCause [14.200s][info][gc ] Failed to allocate Shared, 61709K [14.202s][info][gc ] GC(102) Concurrent reset (Young) 6.371ms [14.203s][info][gc,barrier ] GC(102) Cleaned read_table from 0x0000754a50080000 to 0x0000754a5027ffff [14.203s][info][gc,start ] GC(102) Pause Init Mark (Young) [14.203s][info][gc,task ] GC(102) Using 4 of 4 workers for init marking [14.205s][info][gc,barrier ] GC(102) Current write_card_table: 0x0000754a4fa80000 [14.205s][info][gc,start ] GC(102) Verify Before Mark, Level 4 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/xlpeng/repos/jdk/src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp:1270), pid=2167519, tid=2167538 # Error: Verify init-mark remembered set violation; clean card, it should be dirty. Referenced from: interior location: 0x00000000c00c2bfc inside Java heap not in collection set region: | 1|R |O|BTE c0080000, c00c2c78, c0100000|TAMS c0080000|UWM c00c2c78|U 267K|T 0B|G 0B|P 0B|S 267K|L 267K|CP 0 Object: 0x00000000e8c00000 - klass 0x000001df001abfa0 [I not allocated after mark start not after update watermark not marked strong not marked weak not in collection set age: 0 mark: mark(is_unlocked no_hash age=0) region: | 1304|H |Y|BTE e8c00000, e8c80000, e8c80000|TAMS e8c80000|UWM e8c80000|U 512K|T 0B|G 0B|P 0B|S 512K|L 0B|CP 0 Forwardee: (the object itself) ``` Also verified the fix https://github.com/openjdk/jdk/pull/24092, didn't see remembered set validation error with the fix, but meanwhile I could reproduce the assert error `assert(Universe::is_in_heap(result)) failed` after hundreds of repeating tests, which seems to be a different bug in full GC(in theory it could also lead to remembered set validation error), I'll create another JBS bug to track the investigation and fix of it.
28-03-2025

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/24092 Date: 2025-03-18 00:19:35 +0000
28-03-2025

When I was investigating the root cause, I did find some issues in genshen which may cause issue in remembered set verification, I have created a bug to fix those issues, see the details here https://bugs.openjdk.org/browse/JDK-8352185. After integration of the [PR](https://github.com/openjdk/jdk/pull/24092), it will be great if you can help to run the test again on ppc64le hardware to verify if the init-mark remembered set violation is also fixed.
19-03-2025

Hi Martin, I have tried to reproduce the issue, but we don't have ppc64le hardware, I used Qemu to emulate ppc64le. It would be more helpful If you can enable GC log for test gc/TestAllocHumongousFragment.java#generational and get the GC logs along with the crash log.
19-03-2025

We observed the following (maybe related), also triggered by test gc/TestAllocHumongousFragment.java#generational , on Linux ppc64le : # Internal Error (/priv/jenkins/client-home/workspace/openjdk-jdk-weekly-linux_ppc64le-dbg/jdk/src/hotspot/share/oops/compressedOops.inline.hpp:58), pid=137841, tid=137866 # assert(Universe::is_in_heap(result)) failed: object not in heap 0x00000000f9b00000 # # JRE version: OpenJDK Runtime Environment (25.0) (fastdebug build 25-internal-adhoc.jenkinsi.jdk) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 25-internal-adhoc.jenkinsi.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, shenandoah gc, linux-ppc64le) # Problematic frame: # V [libjvm.so+0x5ac974] CompressedOops::decode_not_null(narrowOop)+0x214 # V [libjvm.so+0x5ac974] CompressedOops::decode_not_null(narrowOop)+0x214 (compressedOops.inline.hpp:58) V [libjvm.so+0x1aa6098] void ShenandoahMark::mark_through_ref<narrowOop, (ShenandoahGenerationType)1>(narrowOop*, Padded<BufferedOverflowTaskQueue<ShenandoahMarkTask, (MemTag)5, 131072u>, 128ul>*, Padded<BufferedOverflowTaskQueue<ShenandoahMarkTask, (MemTag)5, 131072u>, 128ul>*, ShenandoahMarkingContext*, bool) [clone .constprop.0]+0x98 (shenandoahMark.inline.hpp:302) V [libjvm.so+0x1ad6e98] void objArrayOopDesc::oop_iterate_range<ShenandoahMarkRefsClosure<(ShenandoahGenerationType)1> >(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)1>*, int, int)+0x1b8 (shenandoahClosures.inline.hpp:74) V [libjvm.so+0x1aa8f80] void ShenandoahMark::do_chunked_array_start<ShenandoahMarkRefsClosure<(ShenandoahGenerationType)1> >(Padded<BufferedOverflowTaskQueue<ShenandoahMarkTask, (MemTag)5, 131072u>, 128ul>*, ShenandoahMarkRefsClosure<(ShenandoahGenerationType)1>*, oop, bool) [clone .isra.0]+0x190 (shenandoahMark.inline.hpp:170) V [libjvm.so+0x1aa9a48] void ShenandoahMark::do_task<ShenandoahMarkRefsClosure<(ShenandoahGenerationType)1>, (ShenandoahGenerationType)1, (StringDedupMode)0>(Padded<BufferedOverflowTaskQueue<ShenandoahMarkTask, (MemTag)5, 131072u>, 128ul>*, ShenandoahMarkRefsClosure<(ShenandoahGenerationType)1>*, unsigned short*, StringDedup::Requests*, ShenandoahMarkTask*, unsigned int) [clone .constprop.0] [clone .isra.0]+0x5a8 (shenandoahMark.inline.hpp:91) V [libjvm.so+0x1ab7148] ShenandoahMark::mark_loop(unsigned int, TaskTerminator*, ShenandoahReferenceProcessor*, ShenandoahGenerationType, bool, StringDedupMode, StringDedup::Requests*)+0x2198 (shenandoahMark.cpp:196) V [libjvm.so+0x1b41700] ShenandoahSTWMark::finish_mark(unsigned int)+0xe0 (shenandoahSTWMark.cpp:156) V [libjvm.so+0x1b42bbc] ShenandoahSTWMarkTask::work(unsigned int)+0x9c (shenandoahSTWMark.cpp:57) V [libjvm.so+0x1efb070] WorkerThread::run()+0xe0 (workerThread.cpp:69) V [libjvm.so+0x1d33a60] Thread::call_run()+0xe0 (thread.cpp:231) V [libjvm.so+0x17425ac] thread_native_entry(Thread*)+0x18c (os_linux.cpp:877) C [libc.so.6+0xaa130] start_thread+0x170
11-02-2025

Note that as of today the test is problem-listed per https://bugs.openjdk.org/browse/JDK-8322418 in the following modes, but not in (shen) generational mode: gc/TestAllocHumongousFragment.java#adaptive 8298781 generic-all gc/TestAllocHumongousFragment.java#aggressive 8298781 generic-all gc/TestAllocHumongousFragment.java#iu-aggressive 8298781 generic-all gc/TestAllocHumongousFragment.java#g1 8298781 generic-all gc/TestAllocHumongousFragment.java#static 8298781 generic-all It's interesting that it's not disabled in the following modes/ids: passive, generational, compact. Of course the failure reported here would be solely with generational.
17-12-2024

Linked a couple other tickets that show test failing on other platforms (e.g. x86) albeit in other ways (and modes, e.g. non-generational). They may or may not be directly related to the problem reported in this ticket, but indicate potential issues with allocation, card-marking, or GC (shen or genshen) more generally.
17-12-2024

Assertion appears to be in "before mark" card verification, and could be a card-marking error (young object is age 0, and is a humongous int array, fwiw). Should examine PPC's card marking barrier code closely. Also a good idea to see if it occurs on other platforms, but quite possibly in platform-specific implementation of card-marking barrier code.
17-12-2024

We will try to reproduce this on other platforms.
16-12-2024