Bug ID: JDK-8139247 Improper locking of MethodData::_extra_data

Type: Bug
Component: hotspot
Sub-Component: compiler
Affected Version: 9

Priority: P3
Status: Resolved
Resolution: Fixed
OS: solaris
CPU: x86_64

Submitted: 2015-10-09
Updated: 2021-09-24
Resolved: 2016-03-14

JDK 8	JDK 9	Other
8u102Resolved	9 b112Fixed	openjdk8uUnresolved

This is intermittent vm crash, observed in JDK9/b83 same binaries run on solaris-amd64.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xffff80ffb2095008, pid=21847, tid=68
#
# JRE version: Java(TM) SE Runtime Environment (9.0-b83) (build 1.9.0-ea-b83)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (1.9.0-ea-b83, mixed mode, tiered, compressed oops, g1 gc, solaris-amd64)
# Problematic frame:
# V  [libjvm.so+0xc95008]  void ciObjectFactory::ensure_metadata_alive(ciMetadata*)+0x18
#
# Core dump will be written. Default location: /scratch/export/home/aurora/sandbox_keepme/results/run_414/testoutput/tier1/JTwork/scratch/3/core or core.21847
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

---------------  S U M M A R Y ------------

Command Line: -ea -esa -Xmx512m -Djava.security.policy=file:/scratch/export/home/aurora/sandbox_keepme/results/run_414/testoutput/tier1/JTwork/jtreg.policy com.sun.javatest.regtest.agent.AgentServer -allowSetSecurityManager -port 39076

Host: x86 64 bit 2893 MHz, 32 cores, 255G, Oracle Solaris 11.2 X86
Time: Wed Oct  7 06:44:11 2015 PDT elapsed time: 90 seconds (0d 0h 1m 30s)

URL: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/9c7684975803 User: lana Date: 2016-03-30 18:38:07 +0000
30-03-2016
URL: http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/9c7684975803 User: vlivanov Date: 2016-03-14 11:00:31 +0000
14-03-2016
Thanks, David. http://mail.openjdk.java.net/pipermail/hotspot-dev/2016-March/022009.html
03-03-2016
Bad code introduced by JDK-8057038
03-03-2016
MutexLocker(mdo->extra_data_lock()); needs to be: MutexLocker ml(mdo->extra_data_lock()); There must be a named variable otherwise it is out-of-scope immediately and so the destructor is called and the unlock happens.
03-03-2016
9b95 has the same problem: (dbx) dis -a 0xfffffd7fe870f940 ... 0xfffffd7fe870f960: load_extra_data+0x0020: movq %r13,%rdi 0xfffffd7fe870f963: load_extra_data+0x0023: call lock [ 0xfffffd7fe8e65730, .+0x755dcd ] 0xfffffd7fe870f968: load_extra_data+0x0028: movq %r13,%rdi 0xfffffd7fe870f96b: load_extra_data+0x002b: xorq %rax,%rax 0xfffffd7fe870f96e: load_extra_data+0x002e: call unlock [ 0xfffffd7fe8e65890, .+0x755f22 ] And the same happens in 9b107: 0xfffffd7fe66280fd: load_extra_data+0x001d: movq %r13,%rdi 0xfffffd7fe6628100: load_extra_data+0x0020: call lock [ 0xfffffd7fe6df46c0, .+0x7cc5c0 ] 0xfffffd7fe6628105: load_extra_data+0x0025: movq %r13,%rdi 0xfffffd7fe6628108: load_extra_data+0x0028: call unlock [ 0xfffffd7fe6df47f0, .+0x7cc6e8 ]
02-03-2016
I think I found the problem: MutexLocker(mdo->extra_data_lock()); translates into: 0xffff80ffb2088746: load_extra_data+0x0026: call lock [ 0xffff80ffb277d7c0, .+0x6f507a ] 0xffff80ffb208874b: load_extra_data+0x002b: movq %r13,%rdi 0xffff80ffb208874e: load_extra_data+0x002e: xorq %rax,%rax 0xffff80ffb2088751: load_extra_data+0x0031: call unlock [ 0xffff80ffb277d920, .+0x6f51cf ]
02-03-2016
I see some junk in the debugger: in ciMethodData::load_extra_data() at data_dst->translate_from(data_src); dp_src = 0xffff80ff99c3aae8 (dbx) print -fllx dp_src dp_src = { _header = { _bits = 0x2400d _struct = { _tag = 0xd _flags = 0x40 _bci = 0x2 } } _cells = (0xffff80ff9a1033c0) } dp_dst = 0xffff80ffbf5c87c0 (dbx) print -fllx dp_dst dp_dst = { _header = { _bits = 0x4 _struct = { _tag = 0x4 _flags = 0x0 _bci = 0x0 } } _cells = (0xffff80ffb2a3fc20) } I don’t know how it could happen after: if (tag != DataLayout::arg_info_data_tag) { memcpy(dp_dst, dp_src, ((intptr_t)MethodData::next_extra(dp_src)) - ((intptr_t)dp_src)); }
02-03-2016
Indeed looks like a concurrency issue. Somehow, there's a race between ciMethodData::load_data() and MethodData::allocate_bci_to_data. Problematic profile data: dp_src = 0xffff80ff99c3aae8 ((class DataLayout *) 0xffff80ff99c3aae8)->_header._struct = { _tag = '\015' _flags = '@' _bci = 2 } _cells[0]: 0xffff80ff9a1033c0 So, the reader sees NULL, but the actual value is not.
02-03-2016
It seems there was an interaction between MDO access from compiler and MDO update during deopt. C1: 3837 4231 3 com.sun.tools.javac.code.Types$SameTypeVisitor::visitType (221 bytes) Boolean visitType(com.sun.tools.javac.code.Type, com.sun.tools.javac.code.Type); 124: invokevirtual #11 // Method com/sun/tools/javac/code/Type.hasTag:(Lcom/sun/tools/javac/code/TypeTag;)Z Event: 3.835 Thread 0x0000000000430000 Uncommon trap: reason=speculate_class_check action=maybe_recompile pc=0xffff80ffaba164c4 method=com.sun.tools.javac.code.Type.hasTag(Lcom/sun/tools/javac/code/TypeTag;)Z @ 2 I looked through ciMethodData::load_data() and MethodData::allocate_bci_to_data, but don't see how ciMethodData::load_extra_data() could see partially constructed entry (speculative_trap_data w/ NULL method). Can't reproduce the failure.
02-03-2016
It seems m == NULL, because ensure_medata_alive gets NULL: (dbx) dis -a ciObjectFactory::ensure_metadata_alive 0xfffffd7fed51c250: ensure_metadata_alive : pushq %rbp 0xfffffd7fed51c251: ensure_metadata_alive+0x0001: movq %rsp,%rbp 0xfffffd7fed51c254: ensure_metadata_alive+0x0004: pushq %rbx 0xfffffd7fed51c255: ensure_metadata_alive+0x0005: subq $0x0000000000000008,%rsp 0xfffffd7fed51c259: ensure_metadata_alive+0x0009: movq %rsi,%rbx 0xfffffd7fed51c25c: ensure_metadata_alive+0x000c: movq 0x0000000000e3883d [ 0xe3883d ](%rip),%rax 0xfffffd7fed51c263: ensure_metadata_alive+0x0013: cmpb $0x0000000000000000,(%rax) 0xfffffd7fed51c266: ensure_metadata_alive+0x0016: je ensure_metadata_alive+0x70 [ 0xfffffd7fed51c2c0, .+0x5a ] 0xfffffd7fed51c268: ensure_metadata_alive+0x0018: movq (%rbx),%rcx 0xfffffd7fed51c26b: ensure_metadata_alive+0x001b: movq %rbx,%rdi siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000000 RBX=0x0000000000000000 is an unknown value void ciSpeculativeTrapData::translate_from(const ProfileData* data) { Method* m = data->as_SpeculativeTrapData()->method(); ciMethod* ci_m = CURRENT_ENV->get_method(m); CURRENT_ENV->ensure_metadata_alive(ci_m); set_method(ci_m); } ciMethod* get_method(Method* o) { if (o == NULL) return NULL; return get_metadata(o)->as_method(); } SpeculativeTrapData* as_SpeculativeTrapData() const { assert(is_SpeculativeTrapData(), "wrong type"); return is_SpeculativeTrapData() ? (SpeculativeTrapData)this : NULL; } Method method() const { return (Method*)intptr_at(speculative_trap_method); }
01-03-2016
The last event in both cases is uncommon trap due to speculate_class_check: Event: 90.910 Thread 0x0000000000422800 Uncommon trap: reason=speculate_class_check action=maybe_recompile pc=0xffff80ffaa25305c method=com.sun.tools.javac.code.Type.hasTag(Lcom/sun/tools/javac/code/TypeTag;)Z @ 2
01-03-2016
[~amlu], could you please check if it's still happening?
17-02-2016

Relates :	JDK-8066173 - compiler/types/correctness/OffTest.java failed with assert
Relates :	JDK-8057038 - Speculative traps not robust when compilation and class unloading are concurrent