Bug ID: JDK-8231501 VM crash in MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected tag 99

JDK-8231501 : VM crash in MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected tag 99

Type: Bug
Component: hotspot
Sub-Component: compiler
Affected Version: 11,14

Priority: P3
Status: Resolved
Resolution: Fixed
OS: linux

Submitted: 2019-09-26
Updated: 2023-11-13
Resolved: 2019-12-16

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 11	JDK 14	JDK 15	Other
11.0.14-oracleFixed	14 b28Fixed	15Fixed	openjdk8u322Resolved

Related Reports

Duplicate :	JDK-8233522 - compiler/types/correctness/CorrectnessTest.java fails with Internal Error (open/src/hotspot/share/oops/methodData.cpp:1406)
Relates :	JDK-8306767 - Concurrent repacking of extra data in MethodData is potentially unsafe
Relates :	JDK-8215889 - assert(!_unloading) failed: This oop is not available to unloading class loader data with ZGC

Description

The following test failed in an Adhoc Mach5 job:

java/net/httpclient/http2/HpackDecoderDriver.java	


#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (methodData.cpp:1797), pid=7509, tid=7579
#  fatal error: unexpected tag 99
#
# JRE version: Java(TM) SE Runtime Environment (14.0) (build 14-internal+0-2019-09-25-1727216.daniel.fuchs.jdk-dev)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (14-internal+0-2019-09-25-1727216.daniel.fuchs.jdk-dev, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xb6a44d]  MethodData::clean_extra_data(CleanExtraDataClosure*)+0x28d
#
# Core dump will be written. Default location: Core dumps may be processed with "/opt/core.sh %p" (or dumping to /opt/mach5/mesos/work_dir/slaves/fcf4c0c4-d73e-4321-860c-6613427db92b-S489/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/d729818e-38b9-4b28-a9f3-4d8d16c7f5f6/runs/386fae6c-3d7b-4119-bd17-b0521452089d/testoutput/test-support/jtreg_open_test_jdk_java_net_httpclient/scratch/1/core.7509)
#
# An error report file with more information is saved as:
# /opt/mach5/mesos/work_dir/slaves/fcf4c0c4-d73e-4321-860c-6613427db92b-S489/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/d729818e-38b9-4b28-a9f3-4d8d16c7f5f6/runs/386fae6c-3d7b-4119-bd17-b0521452089d/testoutput/test-support/jtreg_open_test_jdk_java_net_httpclient/scratch/1/hs_err_pid7509.log

[error occurred during error reporting (), id 0xe0000000, Internal Error (open/src/hotspot/share/ci/ciMethodData.cpp:666)]

#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

Comments

Fix Request (11u): I've requested backport of JDK-8215889 which was also backported by Oracle. These changes improve reliability with respect to concurrent method data modifications. Note that we had real problems in SAP JVM without these changes. Applies almost cleanly on top of JDK-8215889. Review: https://github.com/openjdk/jdk11u-dev/pull/435
28-09-2021
If this patch is approved for 8u, I'd like to see it in 11u as well, even if 11u is nominally unaffected. See https://mail.openjdk.java.net/pipermail/jdk8u-dev/2021-April/013650.html.
14-04-2021
11u backport: I don't think 11 is affected by this one as AFAICT the code for JDK-8215889 causes it.
03-03-2020
URL: https://hg.openjdk.java.net/jdk/jdk14/rev/bd15714926ec User: thartmann Date: 2019-12-16 10:30:44 +0000
16-12-2019
http://cr.openjdk.java.net/~chagedorn/8231501/webrev.01/ Before loading and copying the extra data from the MDO to the ciMDO in ciMethodData::load_extra_data(), the metadata is prepared in a fixed-point iteration by cleaning all SpeculativeTrapData entries of methods whose klasses are unloaded [3]. If it encounters such a dead entry it releases the extra data lock (due to ranking issues) and tries again later [4]. This release of the lock triggers the bug: There can be cases where one thread A is waiting in the whitebox API method to get the extra data lock [2] to clean the extra data for the very same MDO for which another thread B just released the lock at [4]. If that MDO actually contained SpeculativeTrapData entries, then thread A cleaned those but the ciMDO, which thread B is preparing, still contains the uncleaned old MDO extra data (because thread B only made a snapshot of the MDO earlier at [5]). Things then go wrong when thread B can reacquire the lock after thread A. It tries to load the now cleaned extra data and immediately finishes at [6] since there are no SpeculativeTrapData entries anymore. It copied a single entry with tag DataLayout::no_tag [7] to the ciMDO which actually contained a SpeculativeTrapData entry. This results in a half way cleared entry (since a SpeculativeTrapData entry has an additional cell for the method) and possible other remaining SpeculativeTrapData entries: Let's assume a little-endian ordering and that both 0x00007fff... addresses are real pointers to methods. Tag 13 (0x0d) is used for SpeculativeTrapData and dp points to the first extra data entry: ciMDO extra data before thread B releases the lock at [4] (same extra data for MDO and ciMDO): 0x800000040011000d 0x00007fffd4993c63 0x800000040011000d 0x00007fffd49b1a68 0x0000000000000000 dp: tag = 13 -> next entry = dp+16; dp+8: method 0x00007fffd4993c63 dp+16: tag = 13 -> next entry = dp+32; dp+24: method 0x00007fffd49b1a68 dp+32: tag = 0 -> end of extra data MDO extra data after thread B reacquires the lock and thread A cleaned the MDO (ciMDO extra data is unchanged): 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 dp: tag = 0 -> end of extra data Returning at [6] when the extra data loading from MDO to ciMDO is finished: MDO extra data: 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 dp: tag = 0 -> end of extra data ciMDO extra data, only copied the first no_tag entry from MDO at [7] (8 bytes): 0x0000000000000000 0x00007fffd4993c63 0x800000040011000d 0x00007fffd49b1a68 0x0000000000000000 dp: tag = 0 -> next entry = dp+8 dp+8: tag = 0x63 = 99 -> there is no tag 99 -> fatal... The next time the ciMDO extra data is iterated, for example by using MethodData::next_extra(), it reads tag 99 after processing the first no_tag entry and jumping to the value at offset 8 which causes a crash since there is no tag 99 available. The idea is to first snapshot the data and extra parameter data only and translating those. In a second step, the remaining extra data is prepared as before (possibly giving up the extra data lock). And only afterwards, when the cache is populated and no safepoints can happen anymore, the remaining extra data (trap entries and arg info data) is snapshotted and translated while holding the extra data lock. This ensures that no extra data is cleaned between snapshotting and translation since the lock is not released anymore in the meantime. [1] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/test/hotspot/jtreg/compiler/types/correctness/CorrectnessTest.java [2] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/prims/whitebox.cpp#l1137 [3] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l137 [4] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l115 [5] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l219 [6] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l191 [7] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l176
13-12-2019
I've closed JDK-8233522 as duplicate of this one. Please re-open if it turns out to be a different issue.
05-11-2019
Based on the crashing thread's stack trace I'm moving this bug from hotspot/runtime -> hotspot/compiler for initial triage.
05-10-2019
Here's the crashing thread's stack trace: --------------- T H R E A D --------------- Current thread (0x00007f4cc4003000): JavaThread "C1 CompilerThread3" daemon [_thread_in_vm, id=7579, stack(0x00007f4cedaca000,0x00007f4cedbcb000)] Current CompileTask: C1: 1157 1011 3 jdk.internal.net.http.hpack.BuffersTestingKit::forEachSplit (50 bytes) Stack: [0x00007f4cedaca000,0x00007f4cedbcb000], sp=0x00007f4cedbc9230, free space=1020k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xb6a44d] MethodData::clean_extra_data(CleanExtraDataClosure)+0x28d V [libjvm.so+0x56699a] ciMethodData::prepare_metadata()+0x10a V [libjvm.so+0x566b51] ciMethodData::load_extra_data()+0x31 V [libjvm.so+0x567026] ciMethodData::load_data()+0x346 V [libjvm.so+0x55e6b8] ciMethod::ensure_method_data(methodHandle const&)+0x98 V [libjvm.so+0x563ba7] ciMethod::ensure_method_data()+0x177 V [libjvm.so+0x49038a] Compilation::compile_java_method()+0x3a V [libjvm.so+0x490868] Compilation::compile_method()+0x108 V [libjvm.so+0x490cec] Compilation::Compilation(AbstractCompiler, ciEnv, ciMethod, int, BufferBlob, DirectiveSet)+0x21c V [libjvm.so+0x491659] Compiler::compile_method(ciEnv, ciMethod, int, DirectiveSet)+0xa9 V [libjvm.so+0x5f3bfc] CompileBroker::invoke_compiler_on_method(CompileTask)+0x33c V [libjvm.so+0x5f4fd8] CompileBroker::compiler_thread_loop()+0x4e8 V [libjvm.so+0xd76f92] JavaThread::thread_main_inner()+0xe2 V [libjvm.so+0xd7bbad] Thread::call_run()+0x10d V [libjvm.so+0xbd1197] thread_native_entry(Thread*)+0xe7 Attaching the hs_err_pid and replay files.
05-10-2019