JDK-8216541 : CompiledICHolders of VM locked unloaded nmethods are released too late
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 11,12,13
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: linux
  • CPU: x86_64
  • Submitted: 2019-01-11
  • Updated: 2020-05-25
  • Resolved: 2019-02-05
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 12 JDK 13
11.0.8-oracleFixed 12 b31Fixed 13Fixed
Related Reports
Duplicate :  
Duplicate :  
# A fatal error has been detected by the Java Runtime Environment:
#  Internal Error (src/hotspot/share/code/nmethod.cpp:2839), pid=13948, tid=13963
#  assert(db != __null && !db->is_adapter_blob()) failed: must use stub!
# JRE version: Java(TM) SE Runtime Environment (13.0) (fastdebug build 13-internal+0-jdk13-jdk.113)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 13-internal+0-jdk13-jdk.113, compiled mode, sharing, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x141b8b8]  DirectNativeCallWrapper::verify_resolve_call(unsigned char*) const+0x28

Stack: [0x00007fa260ff8000,0x00007fa2610f9000],  sp=0x00007fa2610f7850,  free space=1022k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x141b8b8]  DirectNativeCallWrapper::verify_resolve_call(unsigned char*) const+0x28
V  [libjvm.so+0xa1d975]  CompiledIC::is_call_to_interpreted() const+0x85
V  [libjvm.so+0xa229c0]  CompiledIC::verify()+0x110
V  [libjvm.so+0xa2ad2b]  CompiledMethod::cleanup_inline_caches_impl(bool, bool)+0x2db
V  [libjvm.so+0xa2bb82]  CompiledMethod::cleanup_inline_caches(bool)+0x52
V  [libjvm.so+0x171d06a]  NMethodSweeper::process_compiled_method(CompiledMethod*)+0x30a
V  [libjvm.so+0x171e77a]  NMethodSweeper::sweep_code_cache()+0x30a
V  [libjvm.so+0x171f52e]  NMethodSweeper::possibly_sweep()+0xde
V  [libjvm.so+0x171fba6]  NMethodSweeper::sweeper_loop()+0x1f6
V  [libjvm.so+0x17ae92f]  JavaThread::thread_main_inner()+0x2cf
V  [libjvm.so+0x17b87cc]  JavaThread::run()+0x1cc
V  [libjvm.so+0x17b5f40]  Thread::call_run()+0x100
V  [libjvm.so+0x14a03ad]  thread_native_entry(Thread*)+0x10d
sorry I think I had mistaken this with another change, the patch needs adjustments when bringing to jdk11. RFR thread : https://mail.openjdk.java.net/pipermail/jdk-updates-dev/2020-May/003133.html

I would like to have the patch in OpenJDK 11 as well, because the issue is present there too. The patch applies cleanly.

applications/kitchensink/Kitchensink.java passes in mach5

Fix request approved. [~eosterlund] We still have 7 days until RC. I thought to suggest you push it into jdk 13 first and see how testing goes. But if you run hs-tier1-7 without issues we can skip it and push into JDK 12 first, I think. I will let you decide.

Fix Request I think it is important that this bug gets fixed in 12. Motivation: 1) Regardless of GC, without this fix, you risk crashing in a race between class loading (and generating c2i adapters), the sweeper (trying to identify CompiledICHolders) and users of the nmethodLocker, such as inline caches (messing up the detection of some CompiledICHolders). While perhaps hard to reproduce on its own (given that this has been broken since forever), it is nasty when it happens. 2) The problem intensifies if users are using JVMTI events for code cache loading and unloading. They make the time window for the crash larger, which makes the problem easier to provoke. 3) It was recently discovered that a similar issue that needs the same fix may be provoked by concurrent class unloading in ZGC, which is being introduced in 12. The intended fix for these issues is to discard all inline cache metadata of nmethods that die, immediately upon dying (i.e. both on transitions from is_alive() to zombie and is_alive() to unloaded). This includes a) releasing IC stubs, b) identifying and releasing CompiledICHolders, and c) setting the inline cache to the clean state. Today, #a is only done on transitions to zombie, meaning it will never happen for unloaded OSR nmethods (problematic for ZGC), and #b is done when deleting nmethods as opposed to when they die (problematic for locked nmethods) The perceived risk of this fix is small. To always clean CompiledICs immediately when the nmethod dies is much more robust, compared to doing only IC stub reclamation on transitions to zombie (and skipping it for transitions to unloaded), and deferring CompiledICHolder reclamation to later when it is no longer safe to identify them. The risk of not having the fix seems much greater. As far as testing goes, 400 iterations of kitchensink have been run with the fix (which is where it has popped up the most after it started listening to more JVMTI events). It has also been tested with hs-tier1-3 and hs-precheckin-comp. Currently running another hs-tier1-7 run to be on the safe side. The fix has already been reviewed by [~kvn] and [~thartmann]. Here is the webrev: http://cr.openjdk.java.net/~eosterlund/8216541/webrev.00/ And here is the review thread: https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-January/036433.html

Reproduced with jdk-11+28 as well (this is probably an old bug).

Yet another interesting observation is that in JDK-8075805 it was found that it is crucial the sweeper cleans IC caches of unloaded nmethods as well. However, we don't clean IC caches of locked unloaded nmethods. And the symptoms look a bit similar.

Another interesting observation is that when we "load" an nmethod, we call nmethod::post_compiled_method_load_event(). That function first calculates a jmethodID, which takes an always safepoint checking JmethodIdCreation_lock, and then creates and queues up a JvmtiDeferredEvent::compiled_method_load_event, incrementing the nmethod lock counter. However, after the safepoint check when calculating the jmethodID, the nmethod* could already have been freed and point at arbitrary memory in the code cache, as there is nothing preventing that nmethod from getting nuked (unloaded, deoptimized and made into a zombie and freed), when we cross that safepoint check. This can also cause races where the nmethod lock counter of seemingly unrelated newly allocated nmethods get incremented and/or decremented spuriously.

As the problems started appearing after kitchensink enabled a bunch of JVMTI features, I started investigating the different ways in which JVMTI events can ruin our day. In particular, the JvmtiDeferredEventQueue support poses a bit of a special case for the nmethod life cycle. These nmethod lifecycle events lock the nmethod (increment a counter in the nmethod), and send an event off to the service thread, which is unlocked later on (decrementing said counter), when that event has been processed. Naturally, for this to work, it is absolutely critical that we don't throw away nmethods that have a non-zero counter value. Back in JDK-7024970, transitions to zombie were guarded with the appropriate counter value checks, so we do not throw away zombies that are locked. However, I noticed there is a bit of a corner case for unloaded OSR nmethods. The GC does not check these counter values when making methods unloaded due to having broken oops. And when the sweeper starts running, it "flushes" (deletes) unloaded OSR nmethods, without checking the counter value. In other words, if a JVMTI event for the nmethod lifecycle triggers, by the time the service thread decrements the counter keeping it alive, the nmethod could already have been freed, and the memory occupied by another newly compiled nmethod. As the freed nmethod is placed on a freelist, a new nmethod may be allocated at the exact same address, and have its counter value spuriously decremented at any point in time. Meanwhile, the CompiledICs utilize the nmethodLocker to keep its state sane in the event of deoptimization. As this nmethodLocker might not work when JVMTI code cache events randomly decrement these counters from the service thread, the CompiledICs may get completely messed up due to the nmethodLocker not keeping things around.

I was able to reproduce the issue with jdk-12+b18 and a modified Kitchensink. This means that the issue is not related to the changes that Erik did in December but existed before.

Erik, I think it's likely that this was introduced by one of your recent changes (for example, JDK-8215491). Could you please have a look? Thanks.

ILW = Verification assert during inline cache cleaning, intermittent with stress test at tier 4, no workaround = MMH = P3