JDK-8235273 : nmethodLocker not needed for COMPILED_METHOD_UNLOAD events
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: jvmti
  • Affected Version: 14
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2019-12-03
  • Updated: 2019-12-12
  • Resolved: 2019-12-04
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 14
14 b26Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
The compiled method unload events save the code_start and jmethodID, and due to previous changes to keep nmethod from being unloaded and zombied, the nmethod compiled method load and unload events will be properly ordered.

So the nmethodLocker is unneeded.

Erik ��sterlund 12:47 PM:  Why the nmethodLocker is problematic:

It is an effort of reducing the different flavours of dead nmethods, in order to make it easier to reason about the code. We have unloaded, zombie, and things that are none of the two that are is_unloading(), and then we have zombies that are is_unloading(), and then on top of that nmethod locker adds yet another dimension to the flavours of dead, where all of said states exist with zero and non-zero reference counters. We have had obscure bugs that occur only with JVMTI deferred events because it twists the life cycle slightly in ways that nobody thought about, because it is actually impossible to think about all these different flavours of dead. For example, we almost always prevent nmethod locked nmethods from transitioning from is_alive() to !is_alive(), except if the reason it died is that it became unloaded.

By keeping things alive through tracing (the iterators), then we have fewer flavours of dead nmethods (they become neither zombie nor unloaded), and its easier to know what happens.
Comments
When Graal is used as as the regular JIT the nmethod should always have a Method* at creation time. If the nmethod has been unloaded then the Method* becomes NULL, so it seems like checking for NULL is just a proxy test for is_unloaded(). I think adding a debug flag to mark nmethod which are in the deferred event queue and checking that flag when performing state transitions would prove whether we're properly visiting all the deferred nmethods. And adding checks to ensures the nmethod is alive at the enqueue point wouldn't hurt.
12-12-2019

I'm linking this bug with JDK-8235696 because the fix for that will fix the issue in these comments. There are 3 problems. 1. the nmethods saved for the deferred event don't have their entry barriers run, if present, so the oops in the nmethods might be not fixed up, and worse, ZGC asserts. 2. an nmethod might be zombied before the nmethods_do() runs on the service thread. I guess mark_as_seen_on_stack doesn't help and we probably shouldn't post these events anyway. 3. ServiceThread::nmethods_do(), oops_do() are supposed to keep the nmethod from being zombied and unloaded respectively but graal generates thousands of compiled method load events *before* the service thread starts. Which was the bug we were looking for in these comments. I'm going to fix these three with JDK-8235696 so I can stress test them together. Edit. Never mind. Per fixed JDK-8235696 some other way. I'll open a new bug.
12-12-2019

Theory why this crashes so hard with Graal: In nmethod::post_compiled_method_load_event the nm->method() is used without a NULL check. But Graal nmethods are special an don't always have a method(). If I just add a NULL check and don't even try to post the event on nmethods without a method(), then it stops crashing in the mentioned test.
11-12-2019

Just talked to Coleen. She convinced me that race doesn't exist because the sweeper calls nmethods_do in a handshake with the Service thread, and therefore the race doesn't exist. So I take that comment back.
10-12-2019

I noticed that in ServiceThread::oops_do and ServiceThread::nmethods_do, we read the _jvmti_service_queue under the Service_lock, but notably read the _jvmti_event outside of that lock. Seems like the effect of that is that sometimes these calls will walk random garbage stack memory. In particular, ServiceThread::nmethods_do is executed from a concurrently executing sweeper thread. It might get some indigestion when it walks garbage memory instead. Sorry I didn't catch that in the review thread. :(
10-12-2019

I was running it manually using something like this: BUILDDIR=/Users/tkrodrig/ws/jdk-jdk/build/macosx-x86_64-server-fastdebug jtreg -jdk:$BUILDDIR/images/jdk -v:error "-vmoptions:-XX:+CreateCoredumpOnCrash -ea -esa -server -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler -Djvmci.Compiler=graal -XX:+TieredCompilation" -nativepath$BUILDDIR/images/test/hotspot/jtreg/native serviceability/jvmti/CompiledMethodLoad/Zombie.java running that repeatedly I get crashes like so JTwork/scratch/hs_err_pid37195.log:# V [libjvm.dylib+0x80bce5] JvmtiExport::post_compiled_method_load(nmethod*)+0x15 JTwork/scratch/hs_err_pid37078.log:# Internal Error (../../src/hotspot/share/runtime/sweeper.cpp:182), pid=37078, tid=25091 JTwork/scratch/hs_err_pid37078.log:# assert(cb->is_nmethod()) failed: CodeBlob should be nmethod I've seen failures of the cb->is_unloading guarantee as well. Maybe nmethods should be marked when they are part of a deferred event with some assertion checking that they aren't transitioned in illegal ways when that's the case.
10-12-2019

This test and code was added for linked bug JDK-8212160. The nmethod should be kept from being unloaded because the per- thread deferred event queue has an oops_do(). It should be kept from being zombied by the per-thread deferred event queue has nmethods_do() - which the sweeper calls mark_as_seen_on_stack. Are the nmethods zombied or unloaded in these failures? I ran all the tiers and didn't see these failures (they are hard to miss!). How would I run these tests on linux?
10-12-2019

URL: https://hg.openjdk.java.net/jdk/jdk/rev/e71931b1c3b7 User: coleenp Date: 2019-12-04 18:48:52 +0000
04-12-2019