We were witnessing random JVM crash that triggers about 2/3 times each day in one of our production environment.
Crash log looks like:
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x0000ffff83e74690, pid=62198, tid=0x0000ffff6dbd51e0
#
# JRE version: OpenJDK Runtime Environment (8.0_222-b10) (build 1.8.0_222-Huawei_JDK_V100R001C00SPC150B002-b10)
# Java VM: OpenJDK 64-Bit Server VM (25.222-b10 mixed mode linux-aarch64 compressed oops)
# Problematic frame:
# V [libjvm.so+0x669690] oopDesc* PSPromotionManager::copy_to_survivor_space<false>(oopDesc*)+0xc0
Stack: [0x0000ffff6d9d6000,0x0000ffff6dbd6000], sp=0x0000ffff6dbd46c0, free space=2041k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x669690] oopDesc* PSPromotionManager::copy_to_survivor_space<false>(oopDesc*)+0xc0
V [libjvm.so+0x9b9db8] PSPromotionManager::drain_stacks_depth(bool)+0x780
V [libjvm.so+0x9bd660] StealTask::do_it(GCTaskManager*, unsigned int)+0x278
V [libjvm.so+0x5d081c] GCTaskThread::run()+0x18c
V [libjvm.so+0x9394ec] java_start(Thread*)+0x11c
C [libpthread.so.0+0x78bc] start_thread+0x19c
Debugging show this is triggered after patching bytecode.
Normal template interpreter sequence:
TemplateTable::putfield_or_static -> TemplateTable::resolve_cache_and_index -> InterpreterRuntime::resolve_from_cache -> InterpreterRuntime::resolve_get_put
cpCache entry is updated after that.
Then we do bytecode patching in TemplateTable::putfield_or_static and dispatch to the next bytecode through InterpreterMacroAssembler::dispatch_next.
510 void InterpreterMacroAssembler::dispatch_next(TosState state, int step, bool generate_poll) {
511 // load next bytecode
512 ldrb(rscratch1, Address(pre(rbcp, step)));
513 dispatch_base(state, Interpreter::dispatch_table(state), generate_poll);
514 }
After that we switch to the fast path: TemplateTable::fast_storefield. This will read the cpCache entry.
But we may have invalid cpCache entry values as the memory barrier is missing.
The bytecode load may be scheduled before the bytecode load and even before the setting of the cpCache entry.
Reference:
armv8 architecture reference manual K11.6.1
This restriction applies only when the data value returned by a read is used as a data value to calculate the
address of a subsequent read or write. It does not apply if the data value returned by a read determines the
condition flags values, and the values of the flags are used for condition code evaluation to determine the
address of a subsequent read, either through conditional execution or the evaluation of a branch. This is called
a control dependency.