JDK-8294538 : missing is_unloading() check in SharedRuntime::fixup_callers_callsite()
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 17,18,19,20
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2022-09-28
  • Updated: 2023-01-04
  • Resolved: 2022-10-22
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17 JDK 19 JDK 20
17.0.7-oracleFixed 19.0.2Fixed 20 b21Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
From JDK-8293648, Erik Österlund wrote:

I also noticed that SharedRuntime::fixup_callers_callsite() patches callsites if the c2i adapter's Method's code is is_in_use(). It seems to miss an "&& !is_unloading()" in there. In practice that has not made a big difference until I removed the sweeper. Because nmethods would typically be is_unloading, because the Method is also unloading, in which case the c2i adapter entry barrier will take care of it. But now that an nmethod can become is_unloading also because it's "cold" or such, its Method is way more likely to be is_alive. And then we can get past the c2i adapter entry barrier, and end up calling fixup_callers_callsite, and miss the is_unloading check on the code, and break unlinking monotonicity of the inline cache cleaning. That could have pretty disasterous consequences when using ZGC. 
Comments
Fix request [17u] I backport this for parity with 17.0.7-oracle. Acceptable risk, see also 19u fix request above. Clean backport. SAP nightly testing passed.
04-01-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk17u-dev/pull/1021 Date: 2023-01-03 09:26:14 +0000
03-01-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk19u/pull/81 Date: 2022-11-22 15:51:17 +0000
22-11-2022

Fix Request (JDK 19u) Fixes potential crashes, especially with ZGC. The fix is low risk and applies cleanly. Already tested and backported to Oracle JDK 17u.
22-11-2022

Changeset: b5efa2af Author: Dean Long <dlong@openjdk.org> Date: 2022-10-22 02:11:55 +0000 URL: https://git.openjdk.org/jdk/commit/b5efa2afe268e3171f54d8488ef69bf67059bd7f
22-10-2022

Based on our understanding of the bug, I'm going to assume this affects releases with the sweeper, such as 17.
19-10-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/10747 Date: 2022-10-18 17:37:30 +0000
18-10-2022

By carefully instrumenting the JVM, I was able to guide it into a crash because of this bug. The result was calling into memory that was filled with 0xcccccccccccccccc.
13-10-2022

Agreed. Would also like to see is_not_used disappear as it's only used by AoT which is no more. And is_not_installed seems to also be a relatively low hanging fruit to remove.
30-09-2022

It would be nice if nmethods could have fewer states, so that we could check fewer conditions to see if everything is fine, or something has changed and we need to do something. Having fewer states would also make reasoning about nmethod states and transitions easier. Here's a list of staleness indicators off the top of my head: 1. nmethod->is_unloading [dead oops or cold] 2. !nmethod->is_in_use or not_entrant 3. nmethod has references to is_old methods 4. nmethod has references to unloading metadata 5. nm->method()->code() != nm [replaced] 6. nmethod is "cold" 7. nmethod references dead oops 8. nmethod references other nmethods that are cold/unloading/not_entrant/is_old/replaced Making sure that is_unloading implies !is_in_use seems like a good start.
30-09-2022

ILW = potential for weird crashes; suspected as possible cause of some rare crashes; no workaround = HLH = P2
30-09-2022

We would still need to check for is_unloading when patching callsites. But you are right that there might be places where we assume an nmethod is !is_in_use after calling make_not_entrant. I’ll look around and see if I can spot anything. Such nmethods are obviously already not entrant, but it isn’t reflected in the is_in_use question. When I prototyped removing inline caches, I had is_in_use check if the state is in_use && !is_unloading. Because that tends to be what you mean. Maybe we should do that.
29-09-2022

[~eosterlund], I also noticed that in make_not_entrant(), we bail out if is_unloading(). I'm wondering if we have code that assumes !is_in_use() after calling make_not_entrant(). If we always had make_not_entrant() change the nmethod state, then would we still need to check is_unloading() in fixup_callers_callsite()?
28-09-2022