Bug ID: JDK-8268524 nmethod::post_compiled_method_load

JDK-8268524 : nmethod::post_compiled_method_load_event racingly called on zombie

Type: Bug
Component: hotspot
Sub-Component: gc
Affected Version: 17

Priority: P2
Status: Closed
Resolution: Fixed
OS: linux
CPU: aarch64

Submitted: 2021-06-10
Updated: 2025-01-29
Resolved: 2021-06-22

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 17	JDK 18
17 b28Fixed	18Fixed

Related Reports

Duplicate :	JDK-8245877 - assert(_value != __null) failed: resolving NULL _value in JvmtiExport::post_compiled_method_load
Relates :	JDK-8181110 - jvmti/hotswap test crashes in Method::checked_resolve_jmethod_id(_jmethodID*)
Relates :	JDK-8267972 - Inline cache cleaning is not monotonic

Description

This appears to be a similar crash as observed with JDK-8267972.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000fffc0edee020, pid=45124, tid=47438
#
# JRE version: Java(TM) SE Runtime Environment (17.0+26) (fastdebug build 17-ea+26-LTS-2407)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 17-ea+26-LTS-2407, compiled mode, sharing, tiered, compressed class ptrs, z gc, linux-aarch64)
# Problematic frame:
# V  [libjvm.so+0x131e020]  Method::is_method_handle_intrinsic() const+0x0


Current thread (0x0000fffc082f7d80):  JavaThread "Thread-0" daemon [_thread_in_vm, id=47438, stack(0x0000fff9ca800000,0x0000fff9caa00000)]

Stack: [0x0000fff9ca800000,0x0000fff9caa00000],  sp=0x0000fff9ca9fe660,  free space=2041k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x131e020]  Method::is_method_handle_intrinsic() const+0x0
V  [libjvm.so+0x61bcf8]  BarrierSetNMethod::is_armed(nmethod*)+0x18
V  [libjvm.so+0x186e154]  ZBarrierSetNMethod::nmethod_entry_barrier(nmethod*)+0x94
V  [libjvm.so+0x9a66f0]  CompiledMethod::run_nmethod_entry_barrier()+0x60
V  [libjvm.so+0x1115f50]  JvmtiDeferredEventQueue::run_nmethod_entry_barriers()+0x40
V  [libjvm.so+0x1076f20]  JvmtiCodeBlobEvents::generate_compiled_method_load_events(JvmtiEnv*)+0x190
V  [libjvm.so+0x107d340]  jvmti_GenerateEvents+0xec
C  [libCompiledZombie.so+0x930]  GenerateEventsThread+0x5c
V  [libjvm.so+0x1112164]  JvmtiAgentThread::call_start_function()+0x1e4
V  [libjvm.so+0x170421c]  JavaThread::thread_main_inner()+0x28c
V  [libjvm.so+0x170e458]  Thread::call_run()+0xf8
V  [libjvm.so+0x142ab34]  thread_native_entry(Thread*)+0x104
C  [libpthread.so.0+0x7738]  start_thread+0x198

Comments

Changeset: 9ec7180f Author: Erik Österlund <eosterlund@openjdk.org> Date: 2021-06-22 15:20:10 +0000 URL: https://git.openjdk.java.net/jdk17/commit/9ec7180f1ebf2ff19b0735f7b1c4fc9b97d632be

22-06-2021

== Problem Domain == In the code exercised by this test, we grab a code cache iterator with the NMethodIterator::only_alive_and_not_unloading mode, under the CodeCache_lock. The idea is to then call post_compiled_method_load_event() on each of these is_alive() nmethods. Surely none of them will be a zombie. Inside of post_compiled_method_load_event() we filter out nmethods that racingly can die, like this: if (is_not_entrant() && can_convert_to_zombie()) { return; } So if the nmethod was dead or is_unloading(), we wouldn't get it into the iterator, and here we explicitly filter out nmethods that can become zombies. Now we should have all bases covered, no way we can end up calling the subsequent code on a zombie! Except... the code called by the sweeper that flips an nmethod to zombie, doesn't hold the CodeCache_lock. Instead it holds the CompiledMethod_lock, which this JVMTI code does not hold. So between it being alive in the iterator, and calling is_not_entrant(), the nmethod could have racingly already become zombie. So when we check is_not_entrant(), it will return false. Because it's a zombie. Therefore we are tricked into believing the nmethod is safe to post around these events, while in fact it is already dead. After we have mistakenly grabbed a zombie nmethod, when we use ZGC, we call the nmethod entry barriers on it. It gets indigestion due to being called on a zombie. Ouch. == Solution Domain == It feels like our code cache iterators are too weak. You tell the iterator to give you is_alive() nmethods, but you are only guaranteed that they were is_alive "at some point", as opposed to throughout the iteration. So bugs like these using the CodeCache iterator can slip in very easily, due to the unintuitiveness. Having the sweeper hold the CodeCache_lock throughout the entire make_zombie() call would achieve that, but moving that around is quite tricky due to the spaghetti nature of that code, and would probably lead to lock rank hell. A targeted fix for this specific failure mode, would hold the CompiledMethod_lock across the safety check in nmethod::post_compiled_method_load_event(), and make sure to check that it is_alive() under that lock.

15-06-2021