JDK-8282844 : Shenandoah Generational: Investigate assertion failure during verification before mark
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Priority: P4
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2022-03-08
  • Updated: 2025-04-23
  • Resolved: 2022-03-22
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
internalFixed
Related Reports
Relates :  
Description
When executing:

    ~/Devel/kdnilsen/gitfarm/balance-without-cancel/build/linux-x86_64-server-slowdebug/jdk/bin/java \
    -XX:-UseNUMA -XX:ActiveProcessorCount=8 \
    -Xmx8g -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational \
    -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahPacing \
    -XX:+ShenandoahVerify -XX:-ShenandoahUncommit \
    -Xlog:gc*=info,safepoint*=info:results/genshen/dacapo-hunt/eclipse.jvm.log::filecount=0,filesize=0 \
    -javaagent:"/home/ubuntu/Devel/kdnilsen/lib/jHiccup-2.0.10/jHiccup.jar=-l,results/genshen/dacapo-hunt/h2.jhiccup.log,-i,1000,-a" \
    -jar ~/Devel/kdnilsen/lib/dacapo-evaluation-git+309e1fa.jar \
    --scratch-directory ~/Devel/kdnilsen/tmp/dacapo \
    --no-validation --converge --variance 5 --no-pre-iteration-gc \
    --iterations 10 --size small eclipse

We experience the intermittent crash:
Run Generational Balanced Remset Without Cancel branch with memory size 8g on dacapo eclipse
Unzip workspace 
Initialize workspace # To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/shenandoahVerifier.cpp:101
[thread 1062240 also had an error]
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/ubuntu/Devel/kdnilsen/gitfarm/balance-without-cancel/src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp:101), pid=1062238, tid=1062302
#  Error: Before Mark, Roots; Object start should be within the region

Referenced from:
  interior location: 0x0000001001c5f868
  inside Java heap
    not in collection set
  region: |    6|P  |O|BTE   1001c00000,   1001ffff90,   1002000000|TAMS   1001ffff90|UWM   1001ffff90|U  4095K|T     0B|G     0B|G     0B|S  4095K|L  4095K|CP   0

Object:
  0x0000001004554aa0 - safe print, no details
  region: |   16|R  |Y|BTE   1004400000,   1004406138,   1004800000|TAMS   1004406138|UWM   1004800000|U 24888B|T     0B|G     0B|G     0B|S 24888B|L     0B|CP   0

Raw heap memory:
0x0000001004554a80:   0000000f 0000000f 00002000 00000000
0x0000001004554a90:   00000000 00000000 003c90e7 00000000
0x0000001004554aa0:   0440223b 00000010 00064e10 3ab0eb94
0x0000001004554ab0:   0038b3f0 00389a61 00000000 00000000
0x0000001004554ac0:   0440225b 00000010 00065058 00000100
0x0000001004554ad0:   00000000 00000000 00000000 00000000
0x0000001004554ae0:   000bce33 00000000 00000000 00880505
0x0000001004554af0:   00000000 00000000 0038bec0 00000000
0x0000001004554b00:   00000000 000b873f 003c7b5c 00000000
0x0000001004554b10:   003b8bbb 00000000 00000000 00000000 


#
# JRE version: OpenJDK Runtime Environment (19.0) (slowdebug build 19-internal+0-adhoc.ubuntu.balance-without-cancel)
# Java VM: OpenJDK 64-Bit Server VM (slowdebug 19-internal+0-adhoc.ubuntu.balance-without-cancel, mixed mode, tiered, compressed oops, compressed class ptrs, shenandoah gc, linux-amd64)
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/ubuntu/Devel/kdnilsen/tmp/debug-balance/core.1062238)
#
# An error report file with more information is saved as:
# /home/ubuntu/Devel/kdnilsen/tmp/debug-balance/hs_err_pid1062238.log
[thread 1062301 also had an error]
[thread 1062303 also had an error]
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#
1 rr has recorded a crash (134) in /home/ubuntu/Devel/kdnilsen/tmp/rr-hunt-eclipse/1646616374/latest-trace

The last few lines of the GC log consist of the following:
...
[6192.586s][info][gc,start         ] GC(63) Concurrent reset
[6192.590s][info][gc,task          ] GC(63) Using 2 of 4 workers for concurrent reset
[6193.308s][info][gc               ] GC(63) Concurrent reset 721.647ms
[6193.327s][info][safepoint,cleanup] updating inline caches, 0.0004119 secs
[6193.328s][info][safepoint,cleanup] compilation policy safepoint handler, 0.0001318 secs
[6193.329s][info][safepoint,cleanup] safepoint cleanup tasks, 0.0030982 secs
[6193.334s][info][safepoint,stats  ] Cleanup                      [             22               3 ][        178760716    4186710    2014580  184962006 ]               0
[6193.336s][info][safepoint        ] Safepoint "Cleanup", Time since last: 1012785269 ns, Reaching safepoint: 182947426 ns, At safepoint: 2014580 ns, Total: 184962006 ns
[6193.351s][info][safepoint,cleanup] updating inline caches, 0.0001314 secs
[6193.352s][info][safepoint,cleanup] compilation policy safepoint handler, 0.0001325 secs
[6193.355s][info][safepoint,cleanup] safepoint cleanup tasks, 0.0044812 secs
[6193.356s][info][gc,start         ] GC(63) Pause Init Mark (YOUNG)
[6193.358s][info][gc,task          ] GC(63) Using 4 of 4 workers for init marking
[6193.414s][info][gc,start         ] GC(63) Verify Before Mark, Level 4


It is suspected that this crash is not specific to the balance-without-crash branch of github/shenandoah.

We have an rr recording and are investigating.
Comments
https://github.com/openjdk/shenandoah/commit/71fe3e547c9da1ca695fb405f7b2b034548f41cc
22-03-2022

This is resolved with commit https://github.com/openjdk/shenandoah/commit/71fe3e547c9da1ca695fb405f7b2b034548f41cc
22-03-2022

The root cause of this problem is the following: 1. During full GC, old-gen regions that are pinned do not get collected. 2. Within these uncollected regions, certain dead objects may hold invalid pointers (because update refs does not update dead objects) 3. The intention is that we fill and coalesce dead objects within pinned old-gen regions during full GC. This code was recently added to https://github.com/openjdk/shenandoah, commit a9a9f138d88a31054db2c4ae169c26f05ebd2da5 4. Debugging of this crash reveals that the coalesce-and-fill effort for pinned old-gen regions will be aborted before it has completely coalesced and filled all dead objects within a region if the GC cancellation flag is left enabled at the start of Full GC. 5. The planned fix is to turn off the GC cancellation flag at the start of Full GC. 6. The commit for this fix has not yet been merged due to a regression observed during testing of the proposed fix.
18-03-2022