JDK-8246487 : misc tests SIGSEGV in ClassLoaderData::is_alive()
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 15
  • Priority: P2
  • Status: Resolved
  • Resolution: External
  • OS: linux
  • CPU: x86_64
  • Submitted: 2020-06-03
  • Updated: 2020-11-15
  • Resolved: 2020-06-17
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
The following test failed in the JDK15 CI:

java/foreign/TestLayoutPaths.java

Here's a snippet from the log file:

#section:testng
----------messages:(5/232)----------
command: testng TestLayoutPaths
reason: User specified action: run testng TestLayoutPaths 
Mode: othervm [test needs --add-modules]
Additional options from @modules: --add-modules jdk.incubator.foreign
elapsed time (seconds): 7.611
----------configuration:(3/49)----------
Boot Layer
  add modules: jdk.incubator.foreign

----------System.out:(21/1660)----------
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fb473e96c20, pid=31425, tid=31441
#
# JRE version: Java(TM) SE Runtime Environment (15.0+26) (build 15-ea+26-1286)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (15-ea+26-1286, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x4eec20]  ClassLoaderData::is_alive() const+0x0
#
# Core dump will be written. Default location: Core dumps may be processed with "/opt/core.sh %p" (or dumping to /opt/mach5/mesos/work_dir/slaves/805146e6-8fdb-4552-bf9e-385b73cf7129-S259/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/3a50beb0-7178-47a0-8372-e05302aebdc1/runs/5ac78805-6dbf-48d6-9b0e-13295c0f3d24/testoutput/test-support/jtreg_open_test_jdk_tier1_part3/scratch/2/core.31425)
#
# An error report file with more information is saved as:
# /opt/mach5/mesos/work_dir/slaves/805146e6-8fdb-4552-bf9e-385b73cf7129-S259/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/3a50beb0-7178-47a0-8372-e05302aebdc1/runs/5ac78805-6dbf-48d6-9b0e-13295c0f3d24/testoutput/test-support/jtreg_open_test_jdk_tier1_part3/scratch/2/hs_err_pid31425.log
#
# Compiler replay data is saved as:
# /opt/mach5/mesos/work_dir/slaves/805146e6-8fdb-4552-bf9e-385b73cf7129-S259/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/3a50beb0-7178-47a0-8372-e05302aebdc1/runs/5ac78805-6dbf-48d6-9b0e-13295c0f3d24/testoutput/test-support/jtreg_open_test_jdk_tier1_part3/scratch/2/replay_pid31425.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#
----------System.err:(0/0)----------
----------rerun:(36/4775)*----------


Here's the crashing thread's stack:

---------------  T H R E A D  ---------------

Current thread (0x00007fb46c20a0d0):  JavaThread "C1 CompilerThread0" daemon [_thread_in_vm, id=31441, stack(0x00007fb454577000,0x00007fb454678000)]


Current CompileTask:
C1:    125   35   !   3       java.util.concurrent.ConcurrentHashMap::putVal (432 bytes)

Stack: [0x00007fb454577000,0x00007fb454678000],  sp=0x00007fb454675a88,  free space=1018k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x4eec20]  ClassLoaderData::is_alive() const+0x0
V  [libjvm.so+0x57a482]  ClassHierarchyWalker::find_witness_anywhere(Klass*, bool, bool)+0x352
V  [libjvm.so+0x57bf6b]  Dependencies::find_unique_concrete_method(Klass*, Method*)+0x6b
V  [libjvm.so+0x4b4d42]  ciMethod::find_monomorphic_target(ciInstanceKlass*, ciInstanceKlass*, ciInstanceKlass*, bool)+0x132
V  [libjvm.so+0x3ea6a6]  GraphBuilder::invoke(Bytecodes::Code)+0x13e6
V  [libjvm.so+0x3eaeab]  GraphBuilder::iterate_bytecodes_for_block(int)+0x5cb
V  [libjvm.so+0x3ed3ed]  GraphBuilder::iterate_all_blocks(bool)+0x6d
V  [libjvm.so+0x3ef4c6]  GraphBuilder::GraphBuilder(Compilation*, IRScope*)+0x316
V  [libjvm.so+0x3f4eca]  IRScope::IRScope(Compilation*, IRScope*, int, ciMethod*, int, bool)+0x1ca
V  [libjvm.so+0x3f4f83]  IR::IR(Compilation*, ciMethod*, int)+0xa3
V  [libjvm.so+0x3cf453]  Compilation::build_hir() [clone .part.0]+0x163
V  [libjvm.so+0x3d03ac]  Compilation::compile_java_method()+0x3bc
V  [libjvm.so+0x3d0518]  Compilation::compile_method()+0x108
V  [libjvm.so+0x3d098c]  Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, DirectiveSet*)+0x20c
V  [libjvm.so+0x3d14dd]  Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0xad
V  [libjvm.so+0x5333cb]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0x32b
V  [libjvm.so+0x5347a8]  CompileBroker::compiler_thread_loop()+0x4e8
V  [libjvm.so+0xcd49be]  JavaThread::thread_main_inner()+0xde
V  [libjvm.so+0xcd98fd]  Thread::call_run()+0xfd
V  [libjvm.so+0xb30ad7]  thread_native_entry(Thread*)+0xe7


siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000024


This is a Tier1 failure so I'm starting it as a P2.
The crashing stack is in compiler code so I'm putting
it in hotspot/compiler for initial triage.
Comments
Have not seen any complaints about my plan to close this bug as "External" so I'm closing it.
17-06-2020

I've filed a Mach5 specific infra bug for tracking the failures that we've seen on the one test machine since the last redeploy. I'm planning to closed this bug as "External" on Wed 2020.06.17 if there are no objections.
16-06-2020

The crash is here: V [libjvm.so+0x892010] ClassLoaderData::is_alive() const+0x0 V [libjvm.so+0x9a775a] ClassHierarchyWalker::find_witness_anywhere(Klass*, bool, bool)+0x2ba = +850 >>0x00007ffff67ce47d <+845>: callq 0x7ffff6bd7bf0 <Klass::subklass(bool) const> 0x00007ffff67ce482 <+850>: test %rax,%rax Klass* Klass::subklass(bool log) const { // Need load_acquire on the _subklass, because it races with inserts that // publishes freshly initialized data. for (Klass* chain = Atomic::load_acquire(&_subklass); chain != NULL; // Do not need load_acquire on _next_sibling, because inserts never // create _next_sibling edges to dead data. chain = Atomic::load(&chain->_next_sibling)) { if (chain->is_loader_alive()) { <<<<< HERE: return chain; } else if (log) { [...] } } } return NULL; } %rdi contains klass->_class_loader_data. (gdb) disass ClassLoaderData::is_alive 0x00007ffff6742c20 <+0>: mov 0x24(%rdi),%eax hs_err shows: Registers: RDI=0x0000000000000000 However, in GDB, klass->_class_loader_data is not NULL: (gdb) frame 9 #9 ClassLoaderData::is_alive (this=0x0) (gdb) up #10 0x00007fb47432bd08 in Klass::is_loader_alive (this=0x8002bac30) (gdb) p this->_class_loader_data $3 = (ClassLoaderData *) 0x7fb46c1ddfc0 Because CDS is used, class_loader_data is set here in Klass::restore_unshareable_info(): if (class_loader_data() == NULL) { // Restore class_loader_data to the null class loader data set_class_loader_data(loader_data); ... void set_class_loader_data(ClassLoaderData* loader_data) { _class_loader_data = loader_data; } Maybe we need to have proper memory fencing when setting _class_loader_data???
08-06-2020

Four similar crashes in three different tests so far.
08-06-2020

Not this again. I had a fence() there at one point, but that didn't solve the problem. Remember we thought it had something to do with mmap?
08-06-2020

This is not the same as JDK-8229250 since that was AMD. This crash: model name : Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz
05-06-2020

Same failure mode in a different test: sun/tools/jstat/jstatSnap1.sh Here's the crashing thread's stack trace: --------------- T H R E A D --------------- Current thread (0x00007f5de41b88f0): JavaThread "C1 CompilerThread0" daemon [_thread_in_vm, id=28323, stack(0x00007f5da8853000,0x00007f5da8954000)] Current CompileTask: C1: 188 56 ! 3 java.util.concurrent.ConcurrentHashMap::putVal (432 bytes) Stack: [0x00007f5da8853000,0x00007f5da8954000], sp=0x00007f5da8951888, free space=1018k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x892010] ClassLoaderData::is_alive() const+0x0 V [libjvm.so+0x9a775a] ClassHierarchyWalker::find_witness_anywhere(Klass*, bool, bool)+0x2ba V [libjvm.so+0x9a836d] Dependencies::find_unique_concrete_method(Klass*, Method*)+0xcd V [libjvm.so+0x806493] ciMethod::find_monomorphic_target(ciInstanceKlass*, ciInstanceKlass*, ciInstanceKlass*, bool)+0x3b3 V [libjvm.so+0x646aa5] GraphBuilder::invoke(Bytecodes::Code)+0x1155 V [libjvm.so+0x6475fb] GraphBuilder::iterate_bytecodes_for_block(int)+0x73b V [libjvm.so+0x649179] GraphBuilder::iterate_all_blocks(bool)+0x89 V [libjvm.so+0x64a21f] GraphBuilder::GraphBuilder(Compilation*, IRScope*)+0x46f V [libjvm.so+0x65ac8f] IR::IR(Compilation*, ciMethod*, int)+0x63f V [libjvm.so+0x61be41] Compilation::build_hir() [clone .part.0]+0x261 V [libjvm.so+0x6208cc] Compilation::compile_java_method()+0x1bc V [libjvm.so+0x6216d4] Compilation::compile_method()+0x1d4 V [libjvm.so+0x62209a] Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, DirectiveSet*)+0x32a V [libjvm.so+0x622f02] Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0x142 V [libjvm.so+0x9119ae] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x45e V [libjvm.so+0x913048] CompileBroker::compiler_thread_loop()+0x6b8 V [libjvm.so+0x169b03c] JavaThread::thread_main_inner()+0x21c V [libjvm.so+0x16a0e70] Thread::call_run()+0x100 V [libjvm.so+0x13a69b6] thread_native_entry(Thread*)+0x116 siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002c
04-06-2020

While the test is related to the memory access incubating API, that test has not been updated recently. Moreover that test is just a plain Java test - it's not doing any off-heap access, so it seems odd for it to fail with a VM crash.
03-06-2020