JDK-8255087 : Compiler thread crashes in ClassLoaderData::is_alive()
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 16
  • Priority: P2
  • Status: Resolved
  • Resolution: External
  • Submitted: 2020-10-21
  • Updated: 2020-11-15
  • Resolved: 2020-10-28
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Description
Test: runtime/cds/appcds/sharedStrings/IncompatibleOptions.java#id1

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f704cbba360, pid=19812, tid=19829
#
# JRE version: Java(TM) SE Runtime Environment (16.0+21) (fastdebug build 16-ea+21-1188)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 16-ea+21-1188, mixed mode, sharing, tiered, compressed oops, serial gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x987360]  ClassLoaderData::is_alive() const+0x0

---------------  T H R E A D  ---------------

Current thread (0x00007f704414bf10):  JavaThread "C1 CompilerThread0" daemon [_thread_in_vm, id=19829, stack(0x00007f7049cf8000,0x00007f7049df9000)]


Current CompileTask:
C1:    375    6       3       java.util.ImmutableCollections$SetN::probe (56 bytes)

Stack: [0x00007f7049cf8000,0x00007f7049df9000],  sp=0x00007f7049df67c8,  free space=1017k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x987360]  ClassLoaderData::is_alive() const+0x0
V  [libjvm.so+0xab483c]  ClassHierarchyWalker::find_witness_anywhere(Klass*, bool, bool)+0xcc
V  [libjvm.so+0xab6dad]  Dependencies::find_unique_concrete_method(Klass*, Method*)+0x2fd
V  [libjvm.so+0x8f004d]  ciMethod::find_monomorphic_target(ciInstanceKlass*, ciInstanceKlass*, ciInstanceKlass*, bool)+0x38d
V  [libjvm.so+0x72f78a]  GraphBuilder::invoke(Bytecodes::Code)+0x11ba
V  [libjvm.so+0x73056b]  GraphBuilder::iterate_bytecodes_for_block(int)+0x92b
V  [libjvm.so+0x7320c9]  GraphBuilder::iterate_all_blocks(bool)+0x89
V  [libjvm.so+0x7333d7]  GraphBuilder::GraphBuilder(Compilation*, IRScope*)+0x587
V  [libjvm.so+0x742ce3]  IR::IR(Compilation*, ciMethod*, int)+0x653
V  [libjvm.so+0x702fb1]  Compilation::build_hir() [clone .part.0]+0x261
V  [libjvm.so+0x7079cc]  Compilation::compile_java_method()+0x1bc
V  [libjvm.so+0x7087e7]  Compilation::compile_method()+0x1d7
V  [libjvm.so+0x7091fb]  Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, bool, DirectiveSet*)+0x36b
V  [libjvm.so+0x70a5f3]  Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1a3
V  [libjvm.so+0xa104c8]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0xe08
V  [libjvm.so+0xa11018]  CompileBroker::compiler_thread_loop()+0x5a8
V  [libjvm.so+0x185da36]  JavaThread::thread_main_inner()+0x256
V  [libjvm.so+0x1865150]  Thread::call_run()+0x100
V  [libjvm.so+0x1552056]  thread_native_entry(Thread*)+0x116
Comments
I wish we had a little C program but for some reason this bug is the canary in the coal mine for whatever's wrong with these machines.
28-10-2020

This crash has not been reproducible for more than 5 days after we disabled the problematic host. I think we can conclude that it's a hardware problem. Closing this bug as "External" (same as JDK-8246487).
28-10-2020

I have a test case that can reproduce the crash, but only on one very specific host. It cannot be reproduced on another host that has identical configuration. So this could be an hardware bug, but it's odd that a hardware bug would lead to a crash at precisely the same spot (when C1 compiles java.util.ImmutableCollections$SetN::probe). Anyway, the reproduction script is run.sh in the attachment of this bug report. bash run.sh $MYJDK It spawns 12 processes and repeatedly run the same Java program. On the "bad" host, all 12 processes will quit after a total of about 2500 JVM launches in about 5 minuets. (Failure rate = 12 / 2500). If I apply my patch to write to Klass::_class_loader_data while holding Compile_lock, the crash is drastically. I still get about 1 crash in 6000 JVM launches. (Failure rate = 1 / 6000). https://github.com/iklam/jdk/commit/444cb6e711211213f437f0a427372afb1057af54 However, this crash happens only when the C1 thread is accessing Klass::_class_loader_data without holding Compile_lock: V [libjvm.so+0x987360] ClassLoaderData::is_alive() const+0x0 V [libjvm.so+0xab0b88] Dependencies::find_finalizable_subclass(Klass*) [clone .part.0]+0x18 V [libjvm.so+0xab535b] Dependencies::find_finalizable_subclass(Klass*)+0x5b V [libjvm.so+0x8d2903] ciInstanceKlass::has_finalizable_subclass()+0x1e3 V [libjvm.so+0x725cc6] GraphBuilder::call_register_finalizer()+0x2c6 ciInstanceKlass::has_finalizable_subclass() seems to be buggy. It should hold the Compile_lock, because Dependencies::find_finalizable_subclass walks Klass::_next_sibling, which is protected by Compile_lock Klass* Dependencies::find_finalizable_subclass(Klass* k) { if (k->is_interface()) return NULL; if (k->has_finalizer()) return k; k = k->subklass(); while (k != NULL) { Klass* result = find_finalizable_subclass(k); if (result != NULL) return result; k = k->next_sibling(); } return NULL; } There are other calls to Dependencies::xxxxxx from the compiler code without holding Compile_lock. We should see if this is indeed a problem. I filed JDK-8255282.
23-10-2020

Update: I have a new crash even with my patch. In this case, ciEnv::register_method *IS* holding Compile_lock. So there's no reason why it should ever read a NULL from Klass::_class_loader_data. This definitely looks like a hardware bug now. V [libjvm.so+0x987360] ClassLoaderData::is_alive() const+0x0 V [libjvm.so+0xabc386] Dependencies::DepStream::check_klass_dependency(KlassDepChange*)+0xc6 V [libjvm.so+0xabca9e] Dependencies::validate_dependencies(CompileTask*, char**)+0x11e V [libjvm.so+0x8bfe9b] ciEnv::register_method(ciMethod*, int, CodeOffsets*, int, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExc eptionTable*, AbstractCompiler*, bool, bool, RTMState)+0x61b V [libjvm.so+0x7059b7] Compilation::install_code(int)+0xd7 V [libjvm.so+0x708945] Compilation::compile_method()+0x335 V [libjvm.so+0x7091fb] Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, bool, DirectiveSet*)+0x36b V [libjvm.so+0x70a5f3] Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1a3 V [libjvm.so+0xa104c8] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xe08 V [libjvm.so+0xa11018] CompileBroker::compiler_thread_loop()+0x5a8 siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002c <<<- Klass::_class_loader_data == NULL
23-10-2020

3 More crashes since yesterday. All of them are in the C1 thread. One of the crashes has a new stack trace: Stack: [0x00007f3358320000,0x00007f3358421000], sp=0x00007f335841df28, free space=1015k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x566720] ClassLoaderData::is_alive() const+0x0 V [libjvm.so+0x5f3197] Dependencies::find_finalizable_subclass(Klass*) [clone .part.0]+0x57 V [libjvm.so+0x5f5d4b] Dependencies::find_finalizable_subclass(Klass*)+0x5b V [libjvm.so+0x5257cf] ciInstanceKlass::has_finalizable_subclass()+0x8f V [libjvm.so+0xc0c8a1] Parse::call_register_finalizer()+0x91 V [libjvm.so+0xc0e3c5] Parse::return_current(Node*)+0x7b5 V [libjvm.so+0xc1e5af] Parse::do_one_bytecode()+0x590f V [libjvm.so+0xc0bed0] Parse::do_one_block()+0x210 V [libjvm.so+0xc0c305] Parse::do_all_blocks()+0xe5 V [libjvm.so+0xc0f7d9] Parse::Parse(JVMState*, ciMethod*, float)+0x7e9 V [libjvm.so+0x4f153b] ParseGenerator::generate(JVMState*)+0x8b V [libjvm.so+0x5a39cd] Compile::Compile(ciEnv*, ciMethod*, int, bool, bool, bool, bool, DirectiveSet*)+0xaad V [libjvm.so+0x4f0d0f] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x11f V [libjvm.so+0x5acd68] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xd18 V [libjvm.so+0x5ad838] CompileBroker::compiler_thread_loop()+0x4e8 V [libjvm.so+0xd8f1fb] JavaThread::thread_main_inner()+0x11b V [libjvm.so+0xd940cd] Thread::call_run()+0xfd V [libjvm.so+0xbe7ee7] thread_native_entry(Thread*)+0xe7
22-10-2020

Three recent cases (as of this comment, twince on Oct21, once on Oct20). Other than these 3, the previous failure (still recorded by Mach5) is from Jun 10, 2020.
21-10-2020

This is a new occurrence of JDK-8246487 that we haven't seen for a long time. In the latest crash, the VM is still in the middle of bootstraping. Maybe there's a race condition that allows Dependencies::find_unique_concrete_method to see a shared class that is just partially loaded (or has been loaded by another thread, but the states of this class cannot be consistently observed by the C1 thread due to memory barrier problems). See the use of Compile_lock in systemDictionary.cpp.
21-10-2020

ILW = Crash during C1 compilation, extremely intermittent, no known workaround = HLH = P2
21-10-2020