JDK-8361380 : ARM32: Atomic stubs should be in pre-universe
Type:Bug
Component:hotspot
Sub-Component:compiler
Affected Version:26
Priority:P4
Status:Resolved
Resolution:Fixed
OS:linux
CPU:arm
Submitted:2025-07-03
Updated:2025-08-22
Resolved:2025-07-14
The Version table provides details related to the release that this issue/RFE will be addressed.
Unresolved : Release in which this issue/RFE will be addressed. Resolved: Release in which this issue/RFE has been resolved. Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.
Since June 17 the JaCoCo integration tests executed on native arm32 builds crashes randomly with SIGSEGV in Klass::restore_unshareable_info. See hs_err file attached.
Comments
[~shade] Sure I will! I need to improve my monitoring so that I can react more promptly. Thanks for picking this up so quickly!
14-07-2025
> [~shade] Looks good for fastdebug as well as release builds of your branch. I do not see crashes anymore.
Thanks for testing! The fix is in. Nightly/EA builds would eventually catch up. If you see this problem reoccuring, let us know.
[~shade] Looks good for fastdebug as well as release builds of your branch. I do not see crashes anymore.
14-07-2025
[~shade] Sure, builds are running... I'll post the result.
13-07-2025
Marc, I have a PR up with the fix we are about to integrate (see link above), it should fix the issue. Please test, if you have time.
13-07-2025
[~iklam] With the environment variable set JAVA_TOOL_OPTIONS='-XX:+UnlockDiagnosticVMOptions -XX:-AOTCacheParallelRelocation' I wasn't able to reproduce the crash so far.
12-07-2025
A pull request was submitted for review.
Branch: master
URL: https://git.openjdk.org/jdk/pull/26270
Date: 2025-07-11 17:02:07 +0000
11-07-2025
Lifecycle: we map and relocate CDS archive during universe init, see universe_init() -> Metaspace::global_initialize() -> MetaspaceShared::initialize_runtime_shared_and_meta_spaces(). So the atomic stubs should be ready before that.
11-07-2025
Sure.
Your patch seems reasonable.
11-07-2025
Can I have it, though? I already spent quite some time reproducing and understanding the issue.
11-07-2025
[~adinn] This is yours.
11-07-2025
Thank you [~shade] confirming [~adinn] theory. I assign this to him.
11-07-2025
I know Andrew Dinn wanted to do a patch. Whatever happens, I attached my version as 8361380-arm32-atomics.patch, and I am testing if it fixes the ARM32 reproducer I have :)
11-07-2025
Or, as JDK-8358690 specific fix, we can move these to preuniverse?
StubRoutines::_atomic_add_entry = generate_atomic_add();
StubRoutines::_atomic_xchg_entry = generate_atomic_xchg();
StubRoutines::_atomic_cmpxchg_entry = generate_atomic_cmpxchg();
StubRoutines::_atomic_cmpxchg_long_entry = generate_atomic_cmpxchg_long();
StubRoutines::Arm::_atomic_load_long_entry = generate_atomic_load_long();
StubRoutines::Arm::_atomic_store_long_entry = generate_atomic_store_long();
It would take a bit of fiddling to move these from initial to preuniverse.
11-07-2025
Linux ARM32 is the only platform that does this bootstrap-time-actually-not-atomic oddity. Even Zero have migrated to GCC built-ins for atomics, which avoids these bootstrap circularities. It requires linking with -latomic, though, which Zero build automatically adds. Maybe we could/should do the same for ARM32.
11-07-2025
One might think it is just archive workers that are used too early, but I see in hs_errs that there are other threads -- notably GC threads -- have started at the time of archive workers crash too. So if any of those also (unlikely, but possibly) wanted atomics in some initialization sequences to work, they are also at risk.
Java Threads: ( => current thread )
Total: 0
Other Threads:
0xb598fa38 WorkerThread "GC Thread#0" [id=2470, stack(0xb368c000,0xb370c000) (512K)]
0xb5996d38 ConcurrentGCThread "G1 Main Marker" [id=2471, stack(0xb360b000,0xb368b000) (512K)]
0xb5997d50 WorkerThread "G1 Conc#0" [id=2472, stack(0x76180000,0x76200000) (512K)]
0xb59eda88 ConcurrentGCThread "G1 Refine#0" [id=2473, stack(0x75e80000,0x75f00000) (512K)]
0xb59eeaf8 ConcurrentGCThread "G1 Service" [id=2474, stack(0x75c80000,0x75d00000) (512K)]
=>0xb59f3c70 (exited) Archive Worker Thread "ArchiveWorkerThread" [id=2475, stack(0x74fad000,0x7502d000) (512K)]
Total: 6
11-07-2025
This also lines up nicely with Andrew Dinn's suspicion yesterday that we use a broken cmpxchg somehow. The task distibution code in archive workers uses cmpxchg(int):
void ArchiveWorkerTask::run() {
while (true) {
int chunk = Atomic::load(&_chunk);
if (chunk >= _max_chunks) {
return;
}
if (Atomic::cmpxchg(&_chunk, chunk, chunk + 1, memory_order_relaxed) == chunk) {
assert(0 <= chunk && chunk < _max_chunks, "Sanity");
work(chunk, _max_chunks);
}
}
}
...which ends up calling here for ARM32:
int32_t ARMAtomicFuncs::cmpxchg_bootstrap(int32_t compare_value, int32_t exchange_value, volatile int32_t* dest) {
// try to use the stub:
cmpxchg_func_t func = CAST_TO_FN_PTR(cmpxchg_func_t, StubRoutines::atomic_cmpxchg_entry());
if (func != nullptr) {
_cmpxchg_func = func;
return (*func)(compare_value, exchange_value, dest);
}
assert(Threads::number_of_threads() == 0, "for bootstrap only");
int32_t old_value = *dest;
if (old_value == compare_value)
*dest = exchange_value;
return old_value;
}
...which, as you can see does the *non-atomic update* if `StubRoutines::atomic_cmpxchg_entry()` is not initialized. So there is a chance task distribution code would hand over the same chunk to several threads. Which leads to running relocation over some pointers twice. Which FUBARs them.
Which also explains why it started to show up after JDK-8358690 -- that thing likely moved the stub initialization _after_ archive workers needed it. I guess cmpxchg_bootstrap "believes" that bootstrap is single-threaded until all the stubs have generated. It even assert(Threads::number_of_threads() == 0), but that only takes care of Java threads, not the native ones like archive workers.
11-07-2025
My weak hypothesis was that we end up doing the relocation twice somehow. The second relocation obviously FUBARs the pointer, which would assert in fastdebug, and crash somewhere in release. I added this logging:
void work_on(int chunk, int max_chunks, BitMapView* bm, SharedDataRelocator* reloc) {
BitMap::idx_t size = bm->size();
BitMap::idx_t start = MIN2(size, size * chunk / max_chunks);
BitMap::idx_t end = MIN2(size, size * (chunk + 1) / max_chunks);
assert(end > start, "Sanity: no empty slices");
if (UseNewCode) {
tty->print_cr("(%d) Working on " PTR_FORMAT " : %zu %zu %zu", os::current_process_id(), p2i(bm), start, end, size);
}
bm->iterate(reloc, start, end);
}
...and after a while it showed me (I sorted the output by hand):
$ grep 18862 out
(18862) Working on 0xb53709e8 : 0 115125 921000
(18862) Working on 0xb53709e8 : 115125 230250 921000
(18862) Working on 0xb53709e8 : 230250 345375 921000
(18862) Working on 0xb53709e8 : 345375 460500 921000
(18862) Working on 0xb53709e8 : 460500 575625 921000
(18862) Working on 0xb53709e8 : 575625 690750 921000
(18862) Working on 0xb53709e8 : 690750 805875 921000
(18862) Working on 0xb53709e8 : 690750 805875 921000 ; <----- DOING IT TWICE
(18862) Working on 0xb53709e8 : 805875 921000 921000
(18862) Working on 0xb53709f0 : 0 161699 1293596
(18862) Working on 0xb53709f0 : 161699 323399 1293596
(18862) Working on 0xb53709f0 : 323399 485098 1293596
(18862) Working on 0xb53709f0 : 485098 646798 1293596
(18862) Working on 0xb53709f0 : 646798 808497 1293596
(18862) Working on 0xb53709f0 : 808497 970197 1293596
# Internal Error (/home/shade/trunks/jdk/src/hotspot/share/cds/archiveUtils.inline.hpp:43), pid=18862, tid=18863
# V [libjvm.so+0x8be3f0](18862) Working on 0xb53709f0 : 970197 1131896 1293596
(18862) Working on 0xb53709f0 : 1131896 1293596 1293596
# /home/pi/hs_err_pid18862.log
11-07-2025
Ah yes, here it is!
$ seq 1 10000 | xargs -P 2 -n 1 jdk-mainline/bin/java -fastdebug -Xmx64m Hello
...
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (src/hotspot/share/cds/archiveUtils.inline.hpp:43), pid=2252, tid=2259
# assert(_valid_old_base <= old_ptr && old_ptr < _valid_old_end) failed: must be
#
# JRE version: (26.0) (fastdebug build )
# Java VM: OpenJDK Server VM (fastdebug 26-testing-builds.shipilev.net-openjdk-jdk-b4814-20250710-1937, mixed mode, sharing, g1 gc, linux-arm)
# Problematic frame:
# V [libjvm.so+0x8be380]Hello world
SharedDataRelocationTask::work(int, int)+0x5d4
11-07-2025
Given that we are crashing at the very beginning, I don't think Jacoco is substantial in this story. What probably matters is that we run lots of JVMs, and it gives us more chance to observe a low-frequency event. I just ran:
$ seq 1 10000 | xargs -P 2 -n 1 jdk-mainline/bin/java -Xmx64m Hello
...and it crashed as well, in a new way:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0xb6208504, pid=32705, tid=32706
#
# JRE version: (26.0) (build )
# Java VM: OpenJDK Server VM (26-testing-builds.shipilev.net-openjdk-jdk-b4814-20250710-1937, mixed mode, sharing, g1 gc, linux-arm)
# Problematic frame:
# V [libjvm.so+0x152504] AOTClassLocationConfig::validate(char const*, bool, bool*) const+0x664
#
Current thread (0xb5e19670): JavaThread "Unknown thread" [_thread_in_vm, id=32706, stack(0xb5fdf000,0xb602f000) (320K)]
Stack: [0xb5fdf000,0xb602f000], sp=0xb602b9b8, free space=306k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x152504] AOTClassLocationConfig::validate(char const*, bool, bool*) const+0x664 (aotClassLocation.cpp:976)
V [libjvm.so+0x426250] FileMapInfo::validate_class_location()+0x44 (filemap.cpp:330)
V [libjvm.so+0x890374] MetaspaceShared::map_archive(FileMapInfo*, char*, ReservedSpace)+0x90 (metaspaceShared.cpp:1903)
V [libjvm.so+0x892008] MetaspaceShared::map_archives(FileMapInfo*, FileMapInfo*, bool)+0x11c (metaspaceShared.cpp:1553)
V [libjvm.so+0x892760] MetaspaceShared::initialize_runtime_shared_and_meta_spaces()+0x3ac (metaspaceShared.cpp:1342)
V [libjvm.so+0x88b13c] Metaspace::global_initialize()+0xb4 (metaspace.cpp:743)
V [libjvm.so+0xaf5910] universe_init()+0x150 (universe.cpp:890)
V [libjvm.so+0x5685f0] init_globals()+0x6c (init.cpp:138)
V [libjvm.so+0xaccfd0] Threads::create_vm(JavaVMInitArgs*, bool*)+0x2e8 (threads.cpp:592)
V [libjvm.so+0x66aec4] JNI_CreateJavaVM+0x74 (jni.cpp:3589)
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0xdd78e524
11-07-2025
Dang, it is hard to reproduce for me now as well. I remembered a bit of trivia about small boards: they might be hotplugging CPUs. So the ActiveProcessorCount is different, depending on system conditions. ArchiveWorkers use that to drive the parallelism for relocation code. So that might be a confounding factor for reproducibilty as well.
EDIT: Nevermind, looks like all 4 CPUs are online in all crash logs.
11-07-2025
[~iklam] I can actually reproduce it with almost every build. If it helps here is my setup:
https://github.com/marchof/PiCI/blob/master/jdk-jacoco/docker/Dockerfile
I just added JAVA_TOOL_OPTIONS as requested and will let you know.
11-07-2025
[~marchof] I have a hard time reproducing the crash. I am now trying to run this in a loop overnight ...
How often can you reproduce it?
Could you try adding this to you environment and see if the problem goes away?
$ export JAVA_TOOL_OPTIONS='-XX:+UnlockDiagnosticVMOptions -XX:-AOTCacheParallelRelocation'
$ mvn clean install ....
11-07-2025
Relocation should always happen. The requested based address is 0x80000000, but we always patch it to 0x75594000 (at least that's the case on my RPi. The OS doesn't really give us ASLR)
$ for i in {1..10}; do ./jdk/bin/java -Xlog:cds*,aot* --version | grep 'Reserved archive_space_rs'; done
[0.021s][info][cds] Reserved archive_space_rs [0x75594000 - 0x76000000] (10928128) bytes
[0.011s][info][cds] Reserved archive_space_rs [0x75594000 - 0x76000000] (10928128) bytes
[0.011s][info][cds] Reserved archive_space_rs [0x75594000 - 0x76000000] (10928128) bytes
[0.010s][info][cds] Reserved archive_space_rs [0x75594000 - 0x76000000] (10928128) bytes
[0.011s][info][cds] Reserved archive_space_rs [0x75594000 - 0x76000000] (10928128) bytes
[0.014s][info][cds] Reserved archive_space_rs [0x75594000 - 0x76000000] (10928128) bytes
[0.011s][info][cds] Reserved archive_space_rs [0x75594000 - 0x76000000] (10928128) bytes
[0.011s][info][cds] Reserved archive_space_rs [0x75594000 - 0x76000000] (10928128) bytes
[0.011s][info][cds] Reserved archive_space_rs [0x75594000 - 0x76000000] (10928128) bytes
[0.010s][info][cds] Reserved archive_space_rs [0x75594000 - 0x76000000] (10928128) bytes
Since the classes.jsa file is not changing, the relocation code should do the exact same thing every time. I don't know why the relocation would sometimes assert (or end up patching the archive incorrectly, so we fail very early when trying to load the very first class)
$ ./jdk/bin/java -Xlog:aot+reloc=debug --version
[0.011s][debug][aot,reloc] SharedDataRelocator::_patch_base = 0x75595b6c
[0.011s][debug][aot,reloc] SharedDataRelocator::_patch_end = 0x758a0000
[0.011s][debug][aot,reloc] SharedDataRelocator::_valid_old_base = 0x80000000
[0.011s][debug][aot,reloc] SharedDataRelocator::_valid_old_end = 0x80a6c000
[0.011s][debug][aot,reloc] SharedDataRelocator::_valid_new_base = 0x75594000
[0.011s][debug][aot,reloc] SharedDataRelocator::_valid_new_end = 0x76000000
[0.011s][debug][aot,reloc] SharedDataRelocator::_patch_base = 0x75b04874
[0.011s][debug][aot,reloc] SharedDataRelocator::_patch_end = 0x76000000
[0.011s][debug][aot,reloc] SharedDataRelocator::_valid_old_base = 0x80000000
[0.011s][debug][aot,reloc] SharedDataRelocator::_valid_old_end = 0x80a6c000
[0.011s][debug][aot,reloc] SharedDataRelocator::_valid_new_base = 0x75594000
[0.011s][debug][aot,reloc] SharedDataRelocator::_valid_new_end = 0x76000000
openjdk 26-testing 2026-03-17
OpenJDK Runtime Environment (build 26-testing-builds.shipilev.net-openjdk-jdk-b4816-20250710-2347)
OpenJDK Server VM (build 26-testing-builds.shipilev.net-openjdk-jdk-b4816-20250710-2347, mixed mode, sharing)
[~shade] could you check if this is related to AOTCacheParallelRelocation?
11-07-2025
I can also reproduce on my RPi (Linux raspberrypi 5.10.63-v7l+ #1459 SMP Wed Oct 6 16:41:57 BST 2021 armv7l GNU/Linux) with https://builds.shipilev.net/openjdk-jdk/openjdk-jdk-linux-arm32-hflt-server.tar.xz (build 26-testing-builds.shipilev.net-openjdk-jdk-b4816-20250710-2347).
But the crash is not consistent. I only got the crash once out of many runs.
$ time env JAVA_HOME=/home/pi/shipilev/jdk ../apache-maven-3.9.10/bin/mvn clean install -Dspotless.check.skip -Dmaven.javadoc.skip
[...]
INFO]
[INFO] --- surefire:2.19.1:test (default-test) @ org.jacoco.core.test.validation.java5 ---
-------------------------------------------------------
T E S T S
-------------------------------------------------------
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0xb683f388, pid=739, tid=742
#
Stack: [0xb5fe5000,0xb6035000], sp=0xb6031b08, free space=306k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x783388] Klass::restore_unshareable_info(ClassLoaderData*, Handle, JavaThread*)+0x18c (klass.hpp:686)
V [libjvm.so+0x1743e4] ArrayKlass::restore_unshareable_info(ClassLoaderData*, Handle, JavaThread*)+0x34 (arrayKlass.cpp:235)
V [libjvm.so+0x572914] InstanceKlass::restore_unshareable_info(ClassLoaderData*, Handle, PackageEntry*, JavaThread*)+0x168 (instanceKlass.cpp:2832)
V [libjvm.so+0xb507e8] vmClasses::resolve_shared_class(InstanceKlass*, ClassLoaderData*, Handle, JavaThread*)+0xc0 (vmClasses.cpp:242)
V [libjvm.so+0xb50bb8] vmClasses::resolve_all(JavaThread*)+0x244 (vmClasses.cpp:87)
V [libjvm.so+0xa876d8] SystemDictionary::initialize(JavaThread*)+0xc4 (systemDictionary.cpp:1568)
V [libjvm.so+0xaf2cb4] Universe::genesis(JavaThread*)+0x98 (universe.cpp:443)
V [libjvm.so+0xaf4738] universe2_init()+0x24 (universe.cpp:1079)
V [libjvm.so+0x568660] init_globals2()+0x10 (init.cpp:173)
V [libjvm.so+0xacd070] Threads::create_vm(JavaVMInitArgs*, bool*)+0x388 (threads.cpp:615)
V [libjvm.so+0x66aec4] JNI_CreateJavaVM+0x74 (jni.cpp:3589)
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x6ac39428
See hs_err_pid739_iklam.log in attachment
11-07-2025
[~shade] were you able to reproduce the error with fastdebug build? From the attachment: https://bugs.openjdk.org/secure/attachment/115229/hs_err_pid2468-fastdebug.log
7502e000-75b00000 rw-p 00001000 08:01 62393893 /jdk/lib/server/classes.jsa
7613b000-7617f000 r--p 00ad3000 08:01 62393893 /jdk/lib/server/classes.jsa
The assert happened while relocating the default archive, so in theory it should assert every time the /jdk/bin/java is executed. It will be good to find out what the offending pointer is that causes this assert
# assert(_valid_old_base <= old_ptr && old_ptr < _valid_old_end) failed: must be
11-07-2025
Yes, I was able to reproduce it on my RPi 4 / armhf with https://builds.shipilev.net/openjdk-jdk/openjdk-jdk-linux-arm32-hflt-server.tar.xz running https://github.com/jacoco/jacoco build and a blanket "mvn clean install -Dspotless.check.skip -Dmaven.javadoc.skip". It takes a while, though :)
I think Andrew Dinn found an issue in atomic-long-handling stubs that might explain some of this.
10-07-2025
Sure, the project and its build is public: https://github.com/jacoco/jacoco
The specific build setup here is a 32bit Rasberry Pi 4: first building a 32bit JDK and then using this JDK to run the JaCoCo build.
I haven't tried to reproduce the problem on amd64 yet.
I can add whatever helps to find the root cause.
10-07-2025
Marc, is that a public project build? Can you put a build/test invocation line here?
10-07-2025
[~kvn] I'll try but can't promise: It's a complex Maven build with many JVMs started as sub-processes during the integration tests.
10-07-2025
[~iklam] Can you look why this could happened?
10-07-2025
[~marchof] Can you run fast debug VM with -Xlog:cds=debug -Xlog:aot=debug +codecache+init=debug and attach output?
09-07-2025
Are you sure it is JDK-8358690? From hs_err files I see that AOT is not used.
The only thing I see is that I moved some initial stubs generation after universe init.
09-07-2025
Thanks for narrowing it down! [~kvn] could you please take a look?
09-07-2025
I think I can narrow this down to the following commit:
https://github.com/openjdk/jdk/commit/6e390ef17cf4b6134d5d53ba4e3ae8281fedb3f3
JDK-8358690: Some initialization code asks for AOT cache status way too early
With this commit I can reproduce the crash with every test run. With its parent commit I wasn't able to reproduce it. With fast-debug the error message is:
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (/workspace/src/hotspot/share/cds/archiveUtils.inline.hpp:43), pid=3557, tid=3558
# assert(_valid_old_base <= old_ptr && old_ptr < _valid_old_end) failed: must be
08-07-2025
[~thartmann] Sure, I'll bisect -- will take some time.
07-07-2025
I don't see how JDK-8351645 could trigger this, so the issue was most likely introduced by something else. The failing method looks CDS related. Could you try to narrow it down more to a specific change?
07-07-2025
I was able to reproduce the issue with a fastdebug build. See hs_err_pid2468-fastdebug.log attached:
# Internal Error (/workspace/src/hotspot/share/cds/archiveUtils.inline.hpp:43), pid=2468, tid=2475
# assert(_valid_old_base <= old_ptr && old_ptr < _valid_old_end) failed: must be
#
# JRE version: (26.0) (fastdebug build )
# Java VM: OpenJDK Server VM (fastdebug 26-internal-adhoc.root.workspace, mixed mode, sharing, g1 gc, linux-arm)
# Problematic frame:
# V [libjvm.so+0x604d58] SharedDataRelocationTask::work(int, int)+0x3a3