JDK-8338753 : LotsOfClasses.java fails with fatal error: VM thread could block on lock that may be held by a JavaThread during safepoint: Heap_lock
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 24
  • Priority: P2
  • Status: Closed
  • Resolution: Duplicate
  • OS: linux
  • CPU: x86_64,aarch64
  • Submitted: 2024-08-21
  • Updated: 2024-08-26
  • Resolved: 2024-08-26
Related Reports
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Description
The following test failed in the JDK24 CI:

runtime/cds/appcds/LotsOfClasses.java

Here's a snippet from the log file:

#section:driver
----------messages:(7/233)----------
command: driver LotsOfClasses
reason: User specified action: run driver/timeout=500 LotsOfClasses 
started: Wed Aug 21 15:17:16 UTC 2024
Mode: agentvm
Agent id: 6
finished: Wed Aug 21 15:18:16 UTC 2024
elapsed time (seconds): 59.863
----------configuration:(15/2237)----------

<snip>

----------System.err:(14/809)----------
java.lang.RuntimeException: Hotspot crashed
	at jdk.test.lib.cds.CDSTestUtils.executeAndLog(CDSTestUtils.java:699)
	at jdk.test.lib.cds.CDSTestUtils.executeAndLog(CDSTestUtils.java:675)
	at jdk.test.lib.cds.CDSTestUtils.createArchive(CDSTestUtils.java:270)
	at jdk.test.lib.cds.CDSTestUtils.createArchiveAndCheck(CDSTestUtils.java:306)
	at LotsOfClasses.main(LotsOfClasses.java:56)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:573)
	at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333)
	at java.base/java.lang.Thread.run(Thread.java:1575)

JavaTest Message: Test threw exception: java.lang.RuntimeException
JavaTest Message: shutting down test

result: Failed. Execution failed: `main' threw exception: java.lang.RuntimeException: Hotspot crashed


Here's snippets from the hs_err_pid file:

#  Internal Error (/opt/mach5/mesos/work_dir/slaves/a4a7850a-7c35-410a-b879-d77fbb2f6087-S151409/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b3b1558e-18e4-48bc-bd55-8015da45ee73/runs/4168c1d6-5238-4f4f-b460-2944f8666f5f/workspace/open/src/hotspot/share/runtime/mutex.cpp:58), pid=2045100, tid=2045110
#  fatal error: VM thread could block on lock that may be held by a JavaThread during safepoint: Heap_lock
#
# JRE version: Java(TM) SE Runtime Environment (24.0+12) (fastdebug build 24-ea+12-1289)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 24-ea+12-1289, interpreted mode, tiered, compressed class ptrs, g1 gc, linux-aarch64)
# Problematic frame:
# V  [libjvm.so+0x1259b28]  Mutex::check_safepoint_state(Thread*)+0x198

<snip>

---------------  T H R E A D  ---------------

Current thread (0x0000ffff7831e8f0):  VMThread "VM Thread"          [id=2045110, stack(0x0000ffff4cc26000,0x0000ffff4ce24000) (2040K)]

Stack: [0x0000ffff4cc26000,0x0000ffff4ce24000],  sp=0x0000ffff4ce21f00,  free space=2031k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x1259b28]  Mutex::check_safepoint_state(Thread*)+0x198  (mutex.cpp:58)
V  [libjvm.so+0x125cf9c]  Mutex::lock()+0x4c  (mutex.cpp:120)
V  [libjvm.so+0xcf2380]  vm_exit(int)+0x80  (mutexLocker.hpp:199)
V  [libjvm.so+0x568ce8]  ArchiveHeapWriter::copy_roots_to_buffer(GrowableArrayCHeap<oop, (MEMFLAGS)13>*)+0x78  (archiveHeapWriter.cpp:196)
V  [libjvm.so+0x56a3ec]  ArchiveHeapWriter::copy_source_objs_to_buffer(GrowableArrayCHeap<oop, (MEMFLAGS)13>*)+0x2ac  (archiveHeapWriter.cpp:298)
V  [libjvm.so+0x56b118]  ArchiveHeapWriter::write(GrowableArrayCHeap<oop, (MEMFLAGS)13>*, ArchiveHeapInfo*)+0x64  (archiveHeapWriter.cpp:104)
V  [libjvm.so+0x11f91b4]  VM_PopulateDumpSharedSpace::dump_java_heap_objects(GrowableArray<Klass*>*)+0x224  (metaspaceShared.cpp:885)
V  [libjvm.so+0x11f9390]  VM_PopulateDumpSharedSpace::doit()+0xb0  (metaspaceShared.cpp:535)
V  [libjvm.so+0x1687a20]  VM_Operation::evaluate()+0x100  (vmOperations.cpp:75)
V  [libjvm.so+0x16b0d08]  VMThread::evaluate_operation(VM_Operation*)+0xc8  (vmThread.cpp:283)
V  [libjvm.so+0x16b1800]  VMThread::inner_execute(VM_Operation*)+0x3b0  (vmThread.cpp:427)
V  [libjvm.so+0x16b19fc]  VMThread::loop()+0x88  (vmThread.cpp:493)
V  [libjvm.so+0x16b1b34]  VMThread::run()+0xa4  (vmThread.cpp:177)
V  [libjvm.so+0x15a6220]  Thread::call_run()+0xac  (thread.cpp:225)
V  [libjvm.so+0x12fb0c4]  thread_native_entry(Thread*)+0x130  (os_linux.cpp:858)
C  [libc.so.6+0x806b8]  start_thread+0x2d8


So far this failure has been seen on linux-aarch64 and linux-x64.
Comments
Reclosing as a duplicate instead of Resolving as...
26-08-2024

The other option is to convert this bug into a [BACKOUT] and make it a subtask of a [REDO] if you plan to [REDO] the root cause fix. (For future reference...)
22-08-2024

I propose we close this as duplicate of the backout: JDK-8338856.
22-08-2024

Yeah, let's backout JDK-8337828. I am on it... JDK-8338856.
22-08-2024

[~ccheung] and [~dholmes] - Thanks for the investigations and analysis. I read that changeset and didn't put two and two together. Bumping priority from P3 -> P2 since this is a regression.
22-08-2024

jtreg -va -nr -w tmp -jdk:build/linux-x86_64-server-fastdebug/images/jdk/ -XX:+UseShenandoahGC -XX:-UseCompressedOops test/hotspot/jtreg/runtime/cds/appcds/LotsOfClasses.java Test passed with `-XX:+UseShenandoahGC -XX:-UseCompressedOops`.
22-08-2024

The fix for 8337828 causes the call to ArchiveHeapWriter::copy_roots_to_buffer to invoke vm_exit, if (byte_size >= MIN_GC_REGION_ALIGNMENT) { log_error(cds, heap)("roots array is too large. Please reduce the number of classes"); vm_exit(1); } and it is the vm_exit that triggers the crash in relation to the HeapLock. I don't think we expect to call vm_exit from the VMThread in the context of a VM operation.
21-08-2024

[~dcubed] When this bug was first observed with build jdk-24+12-1289, the build contains the fix for JDK-8337828. I tried with the fix for JDK-8337828 locally on linux-x64 and I saw the VM crash with the same call stack. I'd suggest backing out JDK-8337828 for now. It is a very simple change in one file.
21-08-2024

@ccheung - any ideas why this is a solid failure in Tier4? I looked at recent changesets and nothing is jumping out at me...
21-08-2024

This test has failed in three Tier4 job sets in a row. I'm bumping the priority from P4 -> P3.
21-08-2024

The log indicates that the test was run with -XX:-UseCompressedOops. There's another bug (JDK-8338754) regarding the failure of the same test also ran with -XX:-UseCompressedOops. VM did not crash in the other bug.
21-08-2024