JDK-8327737 : KlassTrainingData is allocated while holding a Mutex
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: repo-leyden
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2024-03-10
  • Updated: 2024-03-11
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
repo-leydenUnresolved
Related Reports
Relates :  
Description
After JDK-8308745 is fixed in the JDK mainline, we can no longer call Metaspace::allocate() while holding a Mutex. Merging the Leyden/premain branch with mainline after the above fix causes the following assert

#  Internal Error (/jdk3/le3/open/src/hotspot/share/memory/metaspace.cpp:863), pid=894351, tid=894660
#  assert(!__the_thread__->owns_locks()) failed: allocating metaspace while holding mutex
#
# JRE version: Java(TM) SE Runtime Environment (23.0) (slowdebug build 23-internal-adhoc.iklam.le3)
# Java VM: JavaHotSpot(TM) 64-Bit Server VM (slowdebug 23-internal-adhoc.iklam.le3, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)

---------------  S U M M A R Y ------------

Command Line: -Xlog:cds -XX:+ArchiveInvokeDynamic -XX:+UnlockDiagnosticVMOptions -XX:+CDSManualFinalImage -XX:CacheDataStore=hw.cds HelloWorld

---------------  T H R E A D  ---------------

Current thread (0x00007ffff02a8d40):  JavaThread "C1 CompilerThread0" daemon [_thread_in_vm, id=894660, stack(0x00007fffbe4fe000,0x00007fffbe5fe000) (1024K)]

Stack: [0x00007fffbe4fe000,0x00007fffbe5fe000],  sp=0x00007fffbe5fc750,  free space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x1282542]  Metaspace::allocate(ClassLoaderData*, unsigned long, MetaspaceObj::Type, JavaThread*)+0xb4  (metaspace.cpp:863)
V  [libjvm.so+0x66440f]  MetaspaceObj::operator new(unsigned long, ClassLoaderData*, unsigned long, MetaspaceObj::Type, JavaThread*)+0x32  (allocation.cpp:78)
V  [libjvm.so+0x15d8fa1]  KlassTrainingData::allocate(InstanceKlass*)+0xcb  (trainingData.cpp:932)
V  [libjvm.so+0x15d6b26]  KlassTrainingData::make(InstanceKlass*, bool)+0x156  (trainingData.cpp:443)
V  [libjvm.so+0x15d5803]  MethodTrainingData::make(methodHandle const&, bool)+0xd3  (trainingData.cpp:184)
V  [libjvm.so+0x15d5d6f]  CompileTrainingData::make(CompileTask*)+0x67  (trainingData.cpp:256)
V  [libjvm.so+0x9e8d0a]  CompileQueue::add(CompileTask*)+0x2ce  (compileBroker.cpp:375)
V  [libjvm.so+0x9e907a]  CompileQueue::transfer_pending()+0x1d6  (compileBroker.cpp:420)
V  [libjvm.so+0x9e9313]  CompileQueue::get(CompilerThread*)+0x14f  (compileBroker.cpp:497)
V  [libjvm.so+0x9eefe7]  CompileBroker::compiler_thread_loop()+0x201  (compileBroker.cpp:2215)
V  [libjvm.so+0xa12f2c]  CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x80  (compilerThread.cpp:68)
V  [libjvm.so+0xe47a5e]  JavaThread::thread_main_inner()+0x156  (javaThread.cpp:724)
V  [libjvm.so+0xe47904]  JavaThread::run()+0x1bc  (javaThread.cpp:709)
V  [libjvm.so+0x15bbdf0]  Thread::call_run()+0x1a4  (thread.cpp:232)
V  [libjvm.so+0x134efe6]  thread_native_entry(Thread*)+0x1d9  (os_linux.cpp:864)

VM state: not at safepoint (normal execution)

VM Mutex/Monitor currently owned by a thread:  ([mutex/lock_event])
[0x00007ffff7d07388] MethodCompileQueueC1_lock - owner thread: 0x00007ffff02a8d40
[0x00007ffff7d07658] TrainingData_lock - owner thread: 0x00007ffff02a8d40

Comments
IIUC this is only an issue because Metaspace::allocate is expected to be called in the context of executing Java code (loading classes etc) and so will throw OOME on failure and so (conservatively) can't be called whilst holding a mutex. If this training data usage is somewhat outside that normal execution model, then perhaps a non-throwing metaspace allocation API is needed?
10-03-2024

As a temporary work around, I disabled the assert in the premain branch: https://github.com/openjdk/leyden/blob/7afec36fe6157deb6a1bf2313c601d88649d10f2/src/hotspot/share/memory/metaspace.cpp#L863-L864 Two possible ways to fix this: [1] Change the two locks to recursive locks. However, changing MethodCompileQueueC1_lock this way might be risky. [2] Allocate KlassTrainingData using malloc instead of Metaspace::allocate(). This means that the KlassTrainingData may not be freed automatically when the corresponding ClassLoaderData is garbage collected. We may need to manually free them (e.g. inside InstanceKlass::deallocate_contents()).
10-03-2024