JDK-8202883 : crash in the JvmtiExport::post_compiled_method_load
  • Type: Bug
  • Component: hotspot
  • Sub-Component: jvmti
  • Affected Version: 9,9.0.1,13
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • Submitted: 2018-05-09
  • Updated: 2019-12-13
  • Resolved: 2019-12-13
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 15
15Resolved
Related Reports
Duplicate :  
Duplicate :  
Description
A customer reported a crash in the JvmtiExport::post_compiled_method_load():
     https://gist.github.com/nezihyigitbasi/52a58698cc9acfcab21c69d00bd0cef2

Hi,

First of all, sorry for bringing up a crash issue in this dev mailing list, but the crash report submission page (https://bugreport.java.com/bugreport/crash.jsp) doesn't list Java 9 in the release drop down, so I couldn't report it there.

We recently got a crash with Java 9.0.1+11 with an interesting stack ending at "JvmtiExport::post_compiled_method_load()" (entire error file is https://gist.github.com/nezihyigitbasi/52a58698cc9acfcab21c69d00bd0cef2 ). A google search didn't end up with much info, so I just wanted to check with this mailing list to see whether anyone has any ideas to investigate this further.

Thanks,
Nezih

Full list of jvm args that were used in the crash scenario:
-server
-Xss2048k
-XX:MaxMetaspaceSize=4G
-XX:+PreserveFramePointer
-XX:-UseBiasedLocking
-XX:+UnlockExperimentalVMOptions
-XX:+UseG1GC
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+UseGCOverheadLimit
-XX:+ExitOnOutOfMemoryError
-agentpath:/packages/presto.presto/bin/libjvmkill.so
-agentpath:/packages/presto.presto/bin/libperfagent.so
-XX:+PrintSafepointStatistics
-XX:PrintSafepointStatisticsCount=1
-XX:+UnlockDiagnosticVMOptions
-XX:+AlwaysPreTouch
-XX:-OmitStackTraceInFastThrow
-XX:ReservedCodeCacheSize=512m
-Djdk.nio.maxCachedBufferSize=30000000
-Djavax.net.ssl.sessionCacheSize=1024
-Djdk.attach.allowAttachSelf
-Xlog:gc*=info,gc+jni=debug,gc+remset=trace,gc+classhisto=trace,safepoint=info:stdout:time,tid,level,tags
-agentpath:/packages/presto.presto/bin/libastack.so=port=7781
-XX:MarkStackSize=16M
-Xms160G
-Xmx160G
-XX:G1RSetRegionEntries=5120


The beginning of the hs_err dump:
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fbdf8700365, pid=349, tid=610
#
# JRE version: Java(TM) SE Runtime Environment (9.0+11) (build 9.0.1+11)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (9.0.1+11, mixed mode, tiered, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x99e365]  JvmtiExport::post_compiled_method_load(nmethod*)+0x275
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/local/bin/coredumper %p|%P|%s|%t|%c|%i|%I|%u|%g|%h|%E|%e||%d" (or dumping to /data/core.349)
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

---------------  S U M M A R Y ------------

Command Line: -Xss2048k -XX:MaxMetaspaceSize=4G -XX:+PreserveFramePointer -XX:-UseBiasedLocking -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:+UseGCOverheadLimit -XX:+ExitOnOutOfMemoryError -agentpath:/packages/presto.presto/bin/libjvmkill.so -agentpath:/packages/presto.presto/bin/libperfagent.so -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 -XX:+UnlockDiagnosticVMOptions -XX:+AlwaysPreTouch -XX:-OmitStackTraceInFastThrow -XX:ReservedCodeCacheSize=1G -Djdk.nio.maxCachedBufferSize=30000000 -Djavax.net.ssl.sessionCacheSize=1024 -Djdk.attach.allowAttachSelf -Xlog:gc*=info,gc+jni=debug,gc+remset=trace,gc+classhisto=trace,safepoint=info:stdout:time,tid,level,tags -agentpath:/packages/presto.presto/bin/libastack.so=port=7781 -XX:G1MaxNewSizePercent=20 -XX:G1HeapRegionSize=32M -Xms180G -Xmx180G -XX:MarkStackSize=256M -XX:MaxGCPauseMillis=500 -XX:GCLockerRetryAllocationCount=5 -XX:G1SummarizeRSetStatsPeriod=100 -XX:MarkStackSizeMax=1G -XX:G1OldCSetRegionThresholdPercent=20 -XX:InitiatingHeapOccupancyPercent=40 -Dnode.data-dir=/data -Dnode.id=presto0559_09_ftw2 -Dnode.environment=ftw2_batch5 -Dnode.location=/ftw2/09/ll/presto0559 -Dlog.levels-file=/data/etc/log.properties -Djava.net.preferIPv6Addresses=true -Dsmc-service-inventory.tier=presto.ftw2_batch5.coordinator.http -Dnode.ip=2401:db00:1120:912a:face:0:1f:0 -Dservice-inventory.uri=http://localhost:7777/v1/smcServiceInventory -Dconfig=/data/etc/config.properties com.facebook.presto.facebook.PrestoFacebook

Host: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 56 cores, 251G, CentOS Linux release 7.4.1708 (Core) 
Time: Tue May  1 21:34:44 2018 PDT elapsed time: 381347 seconds (4d 9h 55m 47s)

---------------  T H R E A D  ---------------

Current thread (0x00007fbdf3ba55b0):  JavaThread "Service Thread" daemon [_thread_in_vm, id=610, stack(0x00007f8ca3bfe000,0x00007f8ca3dff000)]

Stack: [0x00007f8ca3bfe000,0x00007f8ca3dff000],  sp=0x00007f8ca3dfdbb0,  free space=2046k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x99e365]  JvmtiExport::post_compiled_method_load(nmethod*)+0x275
V  [libjvm.so+0x9a6614]  JvmtiDeferredEvent::post()+0x44
V  [libjvm.so+0xc2c507]  ServiceThread::service_thread_entry(JavaThread*, Thread*)+0x337
V  [libjvm.so+0xceee58]  JavaThread::thread_main_inner()+0xd8
V  [libjvm.so+0xb6de12]  thread_native_entry(Thread*)+0xf2
C [libpthread.so.0+0x7e25] start_thread+0xc5


siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000008

Register to memory mapping:

RAX=0x00007f8ca3dfdc10 is pointing into the stack for thread: 0x00007fbdf3ba55b0
RBX=0x00007fbdf3ba55b0 is a thread
RCX=0x0000000000000001 is an unknown value
RDX=0x00007f8c90000bd0 is an unknown value
RSP=0x00007f8ca3dfdbb0 is pointing into the stack for thread: 0x00007fbdf3ba55b0
RBP=0x00007f8ca3dfdd80 is pointing into the stack for thread: 0x00007fbdf3ba55b0
RSI=0x00007f8ca3dfdc00 is pointing into the stack for thread: 0x00007fbdf3ba55b0
RDI=0x00007f8ca3dfdc10 is pointing into the stack for thread: 0x00007fbdf3ba55b0
R8 =0x00007f83a92596b0 is pointing into metadata
R9 =0x0000000000000000 is an unknown value
R10=0x00000000000001fe is an unknown value
R11=0x0000000000000001 is an unknown value
R12=0x00007fbdaf2c0190 is at entry_point+-624 in (nmethod*)0x00007fbdaf2c0190
R13=0x00007f8ca3dfdce0 is pointing into the stack for thread: 0x00007fbdf3ba55b0
R14=0x0000000000000000 is an unknown value
R15=0x00007fbdf0010c50 is an unknown value


Registers:
RAX=0x00007f8ca3dfdc10, RBX=0x00007fbdf3ba55b0, RCX=0x0000000000000001, RDX=0x00007f8c90000bd0
RSP=0x00007f8ca3dfdbb0, RBP=0x00007f8ca3dfdd80, RSI=0x00007f8ca3dfdc00, RDI=0x00007f8ca3dfdc10
R8 =0x00007f83a92596b0, R9 =0x0000000000000000, R10=0x00000000000001fe, R11=0x0000000000000001
R12=0x00007fbdaf2c0190, R13=0x00007f8ca3dfdce0, R14=0x0000000000000000, R15=0x00007fbdf0010c50
RIP=0x00007fbdf8700365, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000004
  TRAPNO=0x000000000000000e

Top of Stack: (sp=0x00007f8ca3dfdbb0)
0x00007f8ca3dfdbb0:   010000000000000e 00007f8ca3dfdcc8
0x00007f8ca3dfdbc0:   00007f8ca3dfdc90 00007f887849d4b0
0x00007f8ca3dfdbd0:   00007f8ca3dfdc10 00007f8cb0000a60
0x00007f8ca3dfdbe0:   00007f8ca3dfdc00 00007f8ca3dfdc50 

Instructions: (pc=0x00007fbdf8700365)
0x00007fbdf8700345:   fe ff ff 4c 89 b5 78 fe ff ff 48 8b 00 48 89 85
0x00007fbdf8700355:   68 ff ff ff 48 8b b8 60 01 00 00 e8 8b 02 f4 ff
0x00007fbdf8700365:   49 8b 46 08 4c 89 ee 48 8b 40 08 48 8b 78 18 e8
0x00007fbdf8700375:   77 86 e7 ff 4c 89 ef 48 89 85 48 fe ff ff e8 b8 


---------------  P R O C E S S  ---------------

Java Threads: ( => current thread )
  0x00007f8b1824ee20 JavaThread "async-http-response-241738" daemon [_thread_in_Java, id=377272, stack(0x00007f8bc33fa000,0x00007f8bc35fb000)]
  0x00007f8b182760b0 JavaThread "async-http-response-241737" daemon [_thread_blocked, id=377271, stack(0x00007f8445379000,0x00007f844557a000)]
  0x00007f8b181e7ee0 JavaThread "async-http-response-241736" daemon [_thread_blocked, id=377270, stack(0x00007f836a9f5000,0x00007f836abf6000)]
  0x00007f8b1809eef0 JavaThread "async-http-response-241735" daemon [_thread_blocked, id=377269, stack(0x00007f86afffa000,0x00007f86b01fb000)]
  0x00007f8b183e1a70 JavaThread "async-http-response-241734" daemon [_thread_in_Java, id=377268, stack(0x00007f834a9f5000,0x00007f834abf6000)]
  0x00007f89501a6100 JavaThread "async-http-response-241733" daemon [_thread_in_Java, id=377267, stack(0x00007f86b1e08000,0x00007f86b2009000)]
  0x00007f8b180737e0 JavaThread "async-http-response-241732" daemon [_thread_in_Java, id=377266, stack(0x00007f83811e9000,0x00007f83813ea000)]
  0x00007fbc640d35f0 JavaThread "async-http-response-241729" daemon [_thread_in_Java, id=377265, stack(0x00007f86431f9000,0x00007f86433fa000)]
  0x00007f8ae04e9520 JavaThread "async-http-response-241728" daemon [_thread_in_Java, id=377264, stack(0x00007f836b9fd000,0x00007f836bbfe000)]
  0x00007fbcbc16e030 JavaThread "async-http-response-241726" daemon [_thread_blocked, id=377263, stack(0x00007f8348de7000,0x00007f8348fe8000)]
  0x00007f8a640f3dc0 JavaThread "async-http-response-241725" daemon [_thread_in_Java, id=377262, stack(0x00007f834a5f3000,0x00007f834a7f4000)]
  0x00007fbd20748350 JavaThread "async-http-response-241724" daemon [_thread_blocked, id=377261, stack(0x00007f83bfdcd000,0x00007f83bffce000)]
  0x00007f8ad83c0850 JavaThread "async-http-response-241723" daemon [_thread_in_Java, id=377260, stack(0x00007f83423f2000,0x00007f83425f3000)]
  0x00007f89641e6210 JavaThread "async-http-response-241722" daemon [_thread_in_Java, id=377259, stack(0x00007f864f5ca000,0x00007f864f7cb000)]
  0x00007fbcd012cdb0 JavaThread "async-http-response-241721" daemon [_thread_blocked, id=377258, stack(0x00007f83737fc000,0x00007f83739fd000)]
  0x00007f8874508bf0 JavaThread "async-http-response-241719" daemon [_thread_in_Java, id=377257, stack(0x00007f834aff8000,0x00007f834b1f9000)]
  0x00007fbcd80912d0 JavaThread "async-http-response-241718" daemon [_thread_in_Java, id=377256, stack(0x00007f83427f4000,0x00007f83429f5000)]
0x00007f8b24588f90 JavaThread "async-http-response-241716" daemon [_thread_in_Java, id=377255, stack(0x00007f83880e8000,0x00007f83882e9000)]
. . . .
Many many more threads in the hs_err dump file.
Comments
Only bugs with associated changesets should be closed as "Fixed".
13-12-2019

This is the same as the various crashes in compiled_method_load bug JDK-8173361. The fix was to make the nmethods GC and sweeper roots with oops_do and nmethods_do. There are several related fixes (see that bug and still in progress) so if they're backported, they should be backported together.
13-12-2019

Note that the nmethodLocker prevents the nmethod from being nmethod::flush:ed (deallocated), but not from getting unloaded/zombied. An nmethod will get unloaded if classes it refers is not reachable (gets unloaded). JvmtiExport::post_compiled_method_load can try to post an unloaded nmethod, with references to Methods that were deallocated when the ClassLoaderData was unloaded. The problem is not _only_ that we defer the event and that the nmethod can be unloaded between the deferred event is enqueued, and when the ServiceThread actually posts the event. We can also unload the class before the deferred "load" event gets pushed onto the queue: void nmethod::post_compiled_method_load_event() { ... if (JvmtiExport::should_post_compiled_method_load() || JvmtiExport::should_post_compiled_method_unload()) { get_and_cache_jmethod_id(); <-- Safepoints can happen here and nmethods/classes can get unloaded } if (JvmtiExport::should_post_compiled_method_load()) { // Let the Service thread (which is a real Java thread) post the event MutexLocker ml(Service_lock, Mutex::_no_safepoint_check_flag); JvmtiDeferredEventQueue::enqueue( JvmtiDeferredEvent::compiled_method_load_event(this)); } In one of the crashes for JDK-8217776 we crash in: Method::jmethod_id+0x53 when dereferencing NULL: reading address 0x0000000000000008 The _jmethod_ids entries are cleared when a class loader gets unloaded: ClassLoaderData::~ClassLoaderData() { ... if (_jmethod_ids != NULL) { Method::clear_jmethod_ids(this); } Our thinking is that this could be fixed by: 1) Make nmethods of "load" events strong roots for the GCs and on-stack for the nmethod sweepr untill the event is posted 2) The roots must be registered before the get_and_cache_jmethod_id. If we can move this call to the actual posting in the ServiceThread, then we could probably use the existing queue as the root set. Otherwise, we need to invent a new root set. Some bonuses with this approach: 1) With this we now have proper lifetime tracking of nmethods and wouldn't need the nmethodLocker usage in the JVMTI code. 2) We would always get always get load / unload events for nmethods - Today an "unload" event will be lost if the nmethod gets unloaded before we've enqueued the "load" event.
07-05-2019

We know the reason for this crash. See comments in the duplicated bug: JDK-8217776.
07-05-2019

So I missed that actually we had a fix for exactly the same failure in our OpenJDK. The fix was to not emit the event if nm->method() was null. Now in retrospect, this fix seems wrong because I believe you might get an unload event without the load event, which seems confusing. My best guess of what could be happening is that between the moment the nm is actually installed to the moment it gets added to the deferred list, it could get sweeped and was not locked in that time frame. Therefore the sweep occurs and when the deferred case happens it gets a null pointer. It seems that the best way to solve this would be to lock it at creation perhaps and then unlock it once the deferred event was created. Because this is a rare case, it is hard to diagnose and get a reproducer though, I've tried multiple times.
14-03-2019

The same customer observed this issue again on Java 10.0+46 and provided the details: jvm args: https://gist.github.com/nezihyigitbasi/301716e9bbcb272d0c2cbadc45938a68 hs_err file: https://gist.github.com/nezihyigitbasi/912fc3e7f2aff933f5828f5e913dae64
12-10-2018

It is unknown if this issue is reproducible with 10 and 11 or not. But I agree with targeting to tbd_major as this needs to be checked for 10 and 11 anyway.
12-10-2018

There are no further (Oracle) JDK 9 updates, hence added (a somewhat Oracle centric) 9-wnf and sent to tbd_major. I would expect further work aught to look at if the issue is seen with >= 10.0.1 - I suppose, ideally, see if it is still an issue with the 11 ea release.
10-05-2018

Okay. More details on this. This is the nmethod locking api: / Locks an nmethod so its code will not get removed and it will not // be made into a zombie, even if it is a not_entrant method. After the // nmethod becomes a zombie, if CompiledMethodUnload event processing // needs to be done, then lock_nmethod() is used directly to keep the // generated code from being reused too early. class nmethodLocker : public StackObj { CompiledMethod* _nm; public: // note: nm can be NULL // Only JvmtiDeferredEvent::compiled_method_unload_event() // should pass zombie_ok == true. static void lock_nmethod(CompiledMethod* nm, bool zombie_ok = false); static void unlock_nmethod(CompiledMethod* nm); // (ditto) nmethodLocker(address pc); // derive nm from pc nmethodLocker(nmethod *nm) { _nm = nm; lock_nmethod(_nm); } nmethodLocker(CompiledMethod *nm) { _nm = nm; lock(_nm); } static void lock(CompiledMethod* method) { if (method == NULL) return; lock_nmethod(method); } static void unlock(CompiledMethod* method) { if (method == NULL) return; unlock_nmethod(method); } nmethodLocker() { _nm = NULL; } ~nmethodLocker() { unlock(_nm); } CompiledMethod* code() { return _nm; } void set_code(CompiledMethod* new_nm) { unlock(_nm); // note: This works even if _nm==new_nm. _nm = new_nm; lock(_nm); } }; // QQQ might we make this work from a frame?? nmethodLocker::nmethodLocker(address pc) { CodeBlob* cb = CodeCache::find_blob(pc); guarantee(cb != NULL && cb->is_compiled(), "bad pc for a nmethod found"); _nm = cb->as_compiled_method(); lock_nmethod(_nm); } // Only JvmtiDeferredEvent::compiled_method_unload_event() // should pass zombie_ok == true. void nmethodLocker::lock_nmethod(CompiledMethod* cm, bool zombie_ok) { if (cm == NULL) return; if (cm->is_aot()) return; // FIXME: Revisit once _lock_count is added to aot_method nmethod* nm = cm->as_nmethod(); Atomic::inc(&nm->_lock_count); assert(zombie_ok || !nm->is_zombie(), "cannot lock a zombie method"); } void nmethodLocker::unlock_nmethod(CompiledMethod* cm) { if (cm == NULL) return; if (cm->is_aot()) return; // FIXME: Revisit once _lock_count is added to aot_method nmethod* nm = cm->as_nmethod(); Atomic::dec(&nm->_lock_count); assert(nm->_lock_count >= 0, "unmatched nmethod lock/unlock"); } bool is_locked_by_vm() const { return _lock_count >0; } The function is_locked_by_vm() must be used to prevent unloading an nmethod in use. But it is used only in the NMethodSweeper by: NMethodSweeper::process_compiled_method() called from NMethodSweeper::sweep_code_cache() called from NMethodSweeper::possibly_sweep() called from NMethodSweeper::sweeper_loop() called from sweeper_thread_entry() called from CodeCacheSweeperThread::CodeCacheSweeperThread() called from CompileBroker::make_thread() called from CompileBroker::init_compiler_sweeper_threads() NMethodSweeper::possibly_flush() called from NMethodSweeper::process_compiled_method() From the other hand, it seems, the nmethod::make_unloaded() cleans up the nm->_method value: // If _method is already NULL the Method* is about to be unloaded, // so we don't have to break the cycle. Note that it is possible to // have the Method* live here, in case we unload the nmethod because // it is pointing to some oop (other than the Method*) being unloaded. if (_method != NULL) { // OSR methods point to the Method*, but the Method* does not // point back! if (_method->code() == this) { _method->clear_code(); // Break a cycle } _method = NULL; // Clear the method of this dead nmethod <== !! } but it does not care about the nmethod locking api above. The nmethod::make_unloaded() is called from this method: // If this oop is not live, the nmethod can be unloaded. bool nmethod::can_unload(BoolObjectClosure* is_alive, oop* root, bool unloading_occurred) { assert(root != NULL, "just checking"); oop obj = *root; if (obj == NULL || is_alive->do_object_b(obj)) { return false; } // If ScavengeRootsInCode is true, an nmethod might be unloaded // simply because one of its constant oops has gone dead. // No actual classes need to be unloaded in order for this to occur. assert(unloading_occurred || ScavengeRootsInCode, "Inconsistency in unloading"); make_unloaded(is_alive, obj); <== !! return true; } In its turn the nmethod::make_unloaded() is called from: bool nmethod::unload_if_dead_at(RelocIterator* iter_at_oop, BoolObjectClosure *is_alive, bool unloading_occurred) { assert(iter_at_oop->type() == relocInfo::oop_type, "Wrong relocation type"); oop_Relocation* r = iter_at_oop->oop_reloc(); // Traverse those oops directly embedded in the code. // Other oops (oop_index>0) are seen as part of scopes_oops. assert(1 == (r->oop_is_immediate()) + (r->oop_addr() >= oops_begin() && r->oop_addr() < oops_end()), "oop must be found in exactly one place"); if (r->oop_is_immediate() && r->oop_value() != NULL) { // Unload this nmethod if the oop is dead. if (can_unload(is_alive, r->oop_addr(), unloading_occurred)) { return true;; } } return false; } bool nmethod::do_unloading_scopes(BoolObjectClosure* is_alive, bool unloading_occurred) { // Scopes for (oop* p = oops_begin(); p < oops_end(); p++) { if (*p == Universe::non_oop_word()) continue; // skip non-oops if (can_unload(is_alive, p, unloading_occurred)) { return true; } } return false; } #if INCLUDE_JVMCI bool nmethod::do_unloading_jvmci(BoolObjectClosure* is_alive, bool unloading_occurred) { bool is_unloaded = false; // Follow JVMCI method BarrierSet* bs = Universe::heap()->barrier_set(); if (_jvmci_installed_code != NULL) { if (_jvmci_installed_code->is_a(HotSpotNmethod::klass()) && HotSpotNmethod::isDefault(_jvmci_installed_code)) { if (!is_alive->do_object_b(_jvmci_installed_code)) { clear_jvmci_installed_code(); } } else { if (can_unload(is_alive, (oop*)&_jvmci_installed_code, unloading_occurred)) { return true; } } } if (_speculation_log != NULL) { if (!is_alive->do_object_b(_speculation_log)) { bs->write_ref_nmethod_pre(&_speculation_log, this); _speculation_log = NULL; bs->write_ref_nmethod_post(&_speculation_log, this); } } return is_unloaded; } #endif The only caller of the the unload_if_dead_at() and do_unloading_scopes() is nmethod::do_unloading_oops(): bool nmethod::do_unloading_oops(address low_boundary, BoolObjectClosure* is_alive, bool unloading_occurred) { // Compiled code { RelocIterator iter(this, low_boundary); while (iter.next()) { if (iter.type() == relocInfo::oop_type) { if (unload_if_dead_at(&iter, is_alive, unloading_occurred)) { return true; } } } } return do_unloading_scopes(is_alive, unloading_occurred); } Let's skip AOT part. Both nmethod::do_unloading_oops() and nmethod::do_unloading_jvmci() called from CompiledMethod::do_unloading() and CompiledMethod::do_unloading_parallel(). CompiledMethod::do_unloading_parallel() called from G1CodeCacheUnloadingTask::clean_nmethod() called from G1CodeCacheUnloadingTask::work_first_pass() and G1CodeCacheUnloadingTask::work_second_pass() called from G1ParallelCleaningTask::G1ParallelCleaningTask() called from G1CollectedHeap::parallel_cleaning() called from G1ConcurrentMark::weakRefsWorkParallelPart() called from G1ConcurrentMark::weakRefsWork() called from G1ConcurrentMark::checkpointRootsFinal() . . . Now, I'm lost... It is hard to find out how all these sub-call-tree should transitively call the NMethodSweeper::process_compiled_method() function which uses the function is_locked_by_vm() to synchronize with post_compiled_method_load().
09-05-2018

Hi Ioi and David, Ioi, thank you for the disassembly. It helps for sure! Below are 3 call sites for the nmethod::post_compiled_method_load_event(): void ciEnv::register_method(ciMethod* target, . . .) { . . . if (nm != NULL) { // JVMTI -- compiled method notification (must be done outside lock) nm->post_compiled_method_load_event(); <== !! } else { . . . } } JVMCIEnv::CodeInstallResult JVMCIEnv::register_method(. . .) { . . . // JVMTI -- compiled method notification (must be done outside lock) if (nm != NULL) { nm->post_compiled_method_load_event(); <== !! . . . } return result; } void AdapterHandlerLibrary::create_native_wrapper(const methodHandle& method) { . . . if (nm != NULL) { . . . nm->post_compiled_method_load_event(); <== !! } } The nmethod::post_compiled_method_load_event() creates a JvmtiDeferredEvent compiled_load_event entry and enqueues it for the ServiceThread processing. void nmethod::post_compiled_method_load_event() { Method* moop = method(); HOTSPOT_COMPILED_METHOD_LOAD( (char *) moop->klass_name()->bytes(), moop->klass_name()->utf8_length(), (char *) moop->name()->bytes(), moop->name()->utf8_length(), (char *) moop->signature()->bytes(), moop->signature()->utf8_length(), insts_begin(), insts_size()); if (JvmtiExport::should_post_compiled_method_load() || JvmtiExport::should_post_compiled_method_unload()) { get_and_cache_jmethod_id(); } if (JvmtiExport::should_post_compiled_method_load()) { // Let the Service thread (which is a real Java thread) post the event MutexLockerEx ml(Service_lock, Mutex::_no_safepoint_check_flag); JvmtiDeferredEventQueue::enqueue( JvmtiDeferredEvent::compiled_method_load_event(this)); <== !! } } The vmtiDeferredEvent::compiled_method_load_event() below uses nmethodLocker::lock_nmethod(nm) to protect the nmethod* nm from being updated: JvmtiDeferredEvent JvmtiDeferredEvent::compiled_method_load_event( nmethod* nm) { JvmtiDeferredEvent event = JvmtiDeferredEvent(TYPE_COMPILED_METHOD_LOAD); event._event_data.compiled_method_load = nm; <== !! // Keep the nmethod alive until the ServiceThread can process // this deferred event. nmethodLocker::lock_nmethod(nm); <== !! return event; } void JvmtiDeferredEventQueue::enqueue(const JvmtiDeferredEvent& event) { assert(Service_lock->owned_by_self(), "Must own Service_lock"); process_pending_events(); // Events get added to the end of the queue (and are pulled off the front). QueueNode* node = new QueueNode(event); if (_queue_tail == NULL) { _queue_tail = _queue_head = node; } else { assert(_queue_tail->next() == NULL, "Must be the last element in the list"); _queue_tail->set_next(node); _queue_tail = node; } Service_lock->notify_all(); assert((_queue_head == NULL) == (_queue_tail == NULL), "Inconsistent queue markers"); } JvmtiDeferredEvent JvmtiDeferredEventQueue::dequeue() { assert(Service_lock->owned_by_self(), "Must own Service_lock"); process_pending_events(); assert(_queue_head != NULL, "Nothing to dequeue"); if (_queue_head == NULL) { // Just in case this happens in product; it shouldn't but let's not crash return JvmtiDeferredEvent(); } QueueNode* node = _queue_head; _queue_head = _queue_head->next(); if (_queue_head == NULL) { _queue_tail = NULL; } assert((_queue_head == NULL) == (_queue_tail == NULL), "Inconsistent queue markers"); JvmtiDeferredEvent event = node->event(); delete node; return event; } The ServiceThread (different thread which is a JavaThread) takes the event from the JvmtiDeferredEventQueue queue and posts it. void ServiceThread::service_thread_entry(JavaThread* jt, TRAPS) { . . . if (has_jvmti_events) { jvmti_event = JvmtiDeferredEventQueue::dequeue(); <== !! } } if (has_jvmti_events) { jvmti_event.post(); <== !! } . . . } The JvmtiDeferredEvent::post() below makes the call: JvmtiExport::post_compiled_method_load(nm); and only then unlocks the nmethod nm with nmethodLocker::unlock_nmethod(nm). void JvmtiDeferredEvent::post() { assert(ServiceThread::is_service_thread(Thread::current()), "Service thread must post enqueued events"); switch(_type) { case TYPE_COMPILED_METHOD_LOAD: { nmethod* nm = _event_data.compiled_method_load; JvmtiExport::post_compiled_method_load(nm); <== !! // done with the deferred event so unlock the nmethod nmethodLocker::unlock_nmethod(nm); <== !! break; } . . . default: ShouldNotReachHere(); } } The printed disassembly shows where we observe the crash in the functions below: void JvmtiExport::post_compiled_method_load(nmethod *nm) { if (JvmtiEnv::get_phase() < JVMTI_PHASE_PRIMORDIAL) { return; } JavaThread* thread = JavaThread::current(); EVT_TRIG_TRACE(JVMTI_EVENT_COMPILED_METHOD_LOAD, ("[%s] method compile load event triggered", JvmtiTrace::safe_get_thread_name(thread))); JvmtiEnvIterator it; for (JvmtiEnv* env = it.first(); env != NULL; env = it.next(env)) { if (env->is_enabled(JVMTI_EVENT_COMPILED_METHOD_LOAD)) { if (env->phase() == JVMTI_PHASE_PRIMORDIAL) { continue; } EVT_TRACE(JVMTI_EVENT_COMPILED_METHOD_LOAD, ("[%s] class compile method load event sent %s.%s ", JvmtiTrace::safe_get_thread_name(thread), (nm->method() == NULL) ? "NULL" : nm->method()->klass_name()->as_C_string(), (nm->method() == NULL) ? "NULL" : nm->method()->name()->as_C_string())); ResourceMark rm(thread); HandleMark hm(thread); // Add inlining information jvmtiCompiledMethodLoadInlineRecord* inlinerecord = create_inline_record(nm); // Pass inlining information through the void pointer JvmtiCompiledMethodLoadEventMark jem(thread, nm, inlinerecord); <== !! JvmtiJavaThreadEventTransition jet(thread); jvmtiEventCompiledMethodLoad callback = env->callbacks()->CompiledMethodLoad; if (callback != NULL) { (*callback)(env->jvmti_external(), jem.jni_methodID(), jem.code_size(), jem.code_data(), jem.map_length(), jem.map(), jem.compile_info()); } } } } class JvmtiCompiledMethodLoadEventMark : public JvmtiMethodEventMark { . . . public: JvmtiCompiledMethodLoadEventMark(JavaThread *thread, nmethod *nm, void* compile_info_ptr = NULL) : JvmtiMethodEventMark(thread,methodHandle(thread, nm->method())) { <== !! . . . } . . . } class JvmtiMethodEventMark : public JvmtiThreadEventMark { . . . JvmtiMethodEventMark(JavaThread *thread, methodHandle method) : JvmtiThreadEventMark(thread), _mid(to_jmethodID(method)) {}; <== line 234 !! jmethodID jni_methodID() { return _mid; } }; All the fragments above show the code path correctly operates with nmethod by protecting it with the nmethodLocker::lock_nmethod(nm) until the event has been posted. There can be several guesses about the above: - the above is correct but some other code abuses the convention and updates the nmethod without grabbing the nmethod lock - the above operations with the nmethod is incorrect (miss some steps) - some other reason (I doubt it is the case) One conclusion is that it is unlikely an issue in the JVMTI agent as I suspected initially. It would still be valuable to have more details about a scenario that reproduced this issue. It is probably enough information to file a bug. Any opinions on the above? Any ideas about what can be broken? Thanks, Serguei
09-05-2018

I was wondering the same thing. Vladimir K. pointed me at: http://hg.openjdk.java.net/jdk/jdk/file/ae0ebd3cf949/src/hotspot/share/runtime/sweeper.hpp#l42 for the nmethod lifecycle. Given this is actually a deferred event, I too wonder if the nmethod may have been made not-entrant or even zombie, since the event was posted. David
09-05-2018

Serguei & Nezih, I've looked at the crash log and it seems like nm->method() is NULL, which caused the crash. I am not familiar with the JVMTI internals or nmethods, so I am just guessing here -- is it legal for nm->method() to be NULL? Or, is it possible that someone has cleared nm->method() before it's processed by post_compiled_method_load? Details from the crash log: ================================= Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x99e365] JvmtiExport::post_compiled_method_load(nmethod*)+0x275 V [libjvm.so+0x9a6614] JvmtiDeferredEvent::post()+0x44 V [libjvm.so+0xc2c507] ServiceThread::service_thread_entry(JavaThread*, Thread*)+0x337 V [libjvm.so+0xceee58] JavaThread::thread_main_inner()+0xd8 V [libjvm.so+0xb6de12] thread_native_entry(Thread*)+0xf2 C [libpthread.so.0+0x7e25] start_thread+0xc5 ... R14=0x0000000000000000 is an unknown value ================================= Here's a disassembly of JvmtiExport::post_compiled_method_load from JDK 9.0.1 http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/b756e7a2ec33/src/share/vm/prims/jvmtiExport.cpp 2051 HandleMark hm(thread); 0x00007ffff67dd239 <+329>: mov %rax,-0x140(%rbp) 0x00007ffff67dd240 <+336>: callq 0x7ffff66149d0 <HandleMark::initialize(Thread*)> 2054 jvmtiCompiledMethodLoadInlineRecord* inlinerecord = create_inline_record(nm); 0x00007ffff67dd245 <+341>: mov %r12,%rdi 0x00007ffff67dd248 <+344>: callq 0x7ffff67d4090 <create_inline_record(nmethod*)> 2055 // Pass inlining information through the void pointer 2056 JvmtiCompiledMethodLoadEventMark jem(thread, nm, inlinerecord); 0x00007ffff67dd24d <+349>: mov %rax,-0x1a8(%rbp) 0x00007ffff67dd254 <+356>: mov 0x70(%r12),%rax 0x00007ffff67dd259 <+361>: mov %rbx,-0x178(%rbp) 0x00007ffff67dd260 <+368>: test %rax,%rax 0x00007ffff67dd263 <+371>: mov %rax,-0x180(%rbp) 0x00007ffff67dd26a <+378>: je 0x7ffff67dd282 <JvmtiExport::post_compiled_method_load(nmethod*)+402> 0x00007ffff67dd26c <+380>: mov %rax,-0xa0(%rbp) 0x00007ffff67dd273 <+387>: mov 0x160(%rbx),%rdi 0x00007ffff67dd27a <+394>: mov %r13,%rsi 0x00007ffff67dd27d <+397>: callq 0x7ffff671d5f0 <GrowableArray<Metadata*>::append(Metadata* const&)> 0x00007ffff67dd282 <+402>: lea 0x218(%rbx),%rax 0x00007ffff67dd289 <+409>: mov %rbx,-0xf0(%rbp) 0x00007ffff67dd290 <+416>: movl $0x0,-0xe0(%rbp) 0x00007ffff67dd29a <+426>: mov %rax,-0xe8(%rbp) 0x00007ffff67dd2a1 <+433>: mov 0x3c8(%rbx),%rax 0x00007ffff67dd2a8 <+440>: test %rax,%rax 0x00007ffff67dd2ab <+443>: je 0x7ffff67dd2b6 <JvmtiExport::post_compiled_method_load(nmethod*)+454> 0x00007ffff67dd2ad <+445>: mov 0x10(%rax),%eax 0x00007ffff67dd2b0 <+448>: mov %eax,-0xe0(%rbp) 0x00007ffff67dd2b6 <+454>: mov 0x38(%rbx),%r14 0x00007ffff67dd2ba <+458>: mov %rbx,%rdi 0x00007ffff67dd2bd <+461>: callq 0x7ffff6707d50 <JNIHandleBlock::allocate_block(Thread*)> 0x00007ffff67dd2c2 <+466>: lea 0x1f8(%rbx),%rdi 0x00007ffff67dd2c9 <+473>: mov %rbx,%rsi 0x00007ffff67dd2cc <+476>: mov %r14,0x118(%rax) 0x00007ffff67dd2d3 <+483>: mov %rax,0x38(%rbx) 0x00007ffff67dd2d7 <+487>: callq 0x7ffff6559410 <JavaFrameAnchor::make_walkable(JavaThread*)> 0x00007ffff67dd2dc <+492>: mov 0x1f0(%rbx),%rsi 0x00007ffff67dd2e3 <+499>: mov -0xf0(%rbp),%rdi 0x00007ffff67dd2ea <+506>: callq 0x7ffff6708520 <JNIHandles::make_local(Thread*, oopDesc*)> 0x00007ffff67dd2ef <+511>: mov -0x1a0(%rbp),%rsi 0x00007ffff67dd2f6 <+518>: mov %rax,-0xd8(%rbp) 0x00007ffff67dd2fd <+525>: lea -0x170(%rbp),%rax 0x00007ffff67dd304 <+532>: mov %rax,%rdi 0x00007ffff67dd307 <+535>: mov %rax,-0x1b0(%rbp) // JvmtiMethodEventMark(JavaThread *thread, methodHandle method) : // JvmtiThreadEventMark(thread), 234 _mid(to_jmethodID(method)) {}; <-- this line 0x00007ffff67dd30e <+542>: callq 0x7ffff66140a0 <methodHandle::methodHandle(methodHandle const&)> 0x00007ffff67dd313 <+547>: mov -0x170(%rbp),%r14 0x00007ffff67dd31a <+554>: movq $0x0,-0x98(%rbp) 0x00007ffff67dd325 <+565>: test %r14,%r14 0x00007ffff67dd328 <+568>: mov %r14,-0xa0(%rbp) 0x00007ffff67dd32f <+575>: je 0x7ffff67dd365 <JvmtiExport::post_compiled_method_load(nmethod*)+629> 0x00007ffff67dd331 <+577>: data16 lea 0x946ca7(%rip),%rdi # 0x7ffff7123fe0 0x00007ffff67dd339 <+585>: data16 data16 callq 0x7ffff60b11d0 <__tls_get_addr@plt> 0x00007ffff67dd341 <+593>: lea -0x188(%rbp),%rsi 0x00007ffff67dd348 <+600>: mov %r14,-0x188(%rbp) 0x00007ffff67dd34f <+607>: mov (%rax),%rax 0x00007ffff67dd352 <+610>: mov %rax,-0x98(%rbp) 0x00007ffff67dd359 <+617>: mov 0x160(%rax),%rdi 0x00007ffff67dd360 <+624>: callq 0x7ffff671d5f0 <GrowableArray<Metadata*>::append(Metadata* const&)> >>> CRASH %r14 == 0x0 198 jmethodID to_jmethodID(methodHandle method) { return method->jmethod_id(); } 0x00007ffff67dd365 <+629>: mov 0x8(%r14),%rax Hope this helps - Ioi
09-05-2018