JDK-8283849 : AsyncGetCallTrace may crash JVM on guarantee
  • Type: Bug
  • Component: hotspot
  • Sub-Component: svc
  • Affected Version: 8,11,17,18,19
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2022-03-29
  • Updated: 2023-07-25
  • Resolved: 2022-05-18
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 17 JDK 19 Other
11.0.17Fixed 17.0.5Fixed 19 b23Fixed openjdk8u352Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
In our systems we are getting a non-trivial number of JVM crashes caused by AsyncGetCallTrace.

Here is the excerpt from the crash log:

```
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (codeCache.cpp:639), pid=7, tid=194
#  guarantee(result == NULL || !result->is_zombie() || result->is_locked_by_vm() || VMError::is_error_reported()) failed: unsafe access to zombie method
#
# JRE version: OpenJDK Runtime Environment Temurin-17.0.2+8 (17.0.2+8) (build 17.0.2+8)
# Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.2+8 (17.0.2+8, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, parallel gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x5b0a58]  CodeCache::find_blob(void*)+0xb8
#
# Core dump will be written. Default location: /usr/local/app/core
#
# JFR recording file will be written. Location: /usr/local/app/hs_err_pid7.jfr
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
```

The full log is attached.

The root cause is AsyncGetCallTrace calling `CodeCache::find_blob()` which contains a guarantee which will fail if we happen to hit a zombie method. This is particularly unpleasant as that guarantee will take down the JVM with crash.

This crash happens when the last frame is marked as zombie but the associated resources haven't been cleaned by the sweeper yet. 
Comments
Much of the code added here was rendered moot by JDK-8290025. It was then removed by JDK-8297864 "Dead code elimination".
05-12-2022

[8u] Fix request Please, consider approving the backport to JDK 8u-dev. The fix improves AsyncGetCallTrace stability (which is used from tools like async-profiler). The backport is almost clean with only minor adjustments (https://git.openjdk.org/jdk8u-dev/pull/73) (reviewed) The change is isolated to AsyncGetCallTrace functionality and as such it is rather low risk.
19-06-2022

[11u] Fix request Please, consider approving the backport to JDK 11u-dev. The fix improves AsyncGetCallTrace stability (which is used from tools like async-profiler). The backport is almost clean with only minor adjustments (https://git.openjdk.org/jdk11u-dev/pull/1148) (reviewed) The change is isolated to AsyncGetCallTrace functionality and as such it is rather low risk.
19-06-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk8u-dev/pull/73 Date: 2022-06-15 14:40:52 +0000
15-06-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk11u-dev/pull/1148 Date: 2022-06-14 17:10:24 +0000
14-06-2022

[17u] Fix request Please, consider approving the backport to JDK 17u-dev. The fix improves AsyncGetCallTrace stability (which is used from tools like async-profiler). The backport is almost clean with only minor adjustments (https://git.openjdk.java.net/jdk17u-dev/pull/402) The change is isolated to AsyncGetCallTrace functionality and as such it is rather low risk.
18-05-2022

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk17u-dev/pull/402 Date: 2022-05-18 07:27:58 +0000
18-05-2022

Changeset: 93c88690 Author: Jaroslav Bachorik <jbachorik@openjdk.org> Date: 2022-05-18 06:45:15 +0000 URL: https://git.openjdk.java.net/jdk/commit/93c88690a1c2cbc7ba7fc70ddef9bf5928e4de03
18-05-2022

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk/pull/8549 Date: 2022-05-05 11:28:14 +0000
05-05-2022

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk/pull/8061 Date: 2022-03-31 15:45:05 +0000
14-04-2022

The problems is turning out to be slightly more complex - the guarantee at https://github.com/openjdk/jdk/blob/f4edb59a6e44d99ba215ee6970ffa6fb26b4798c/src/hotspot/share/code/codeCache.cpp#L655 will pass if the code blob is zombie but is still locked by VM (eg. it is currently 'zombified'). By introducing an artificial delay between the check for the blob being zombie and the check for the VM lock the likelihood of JVM crash is increased significantly. But what this means is that the guarantee will in fact produce a number of false positives - the guarantee would pass because the blob is still locked by VM but then the blob escapes to the caller and may be used later when it is not locked by VM and part or al of its structured may have been reclaimed by the sweeper (running concurrently), rendering the blob invalid with the possibility of SIGSEGV or erratic behaviour.
11-04-2022

An attempt to fix the crash is available at https://github.com/openjdk/jdk/pull/8061 A gist of the fix is to allow relaxed instantiation of a frame for profiling purposes. Currently, a frame instantiation will fail on guarantee when we happen to hit a zombie method which is still on stack. While this would indicate a serious error for the normal execution flow, in case of profiling where the executing thread can be expected at any possible method this is something which may happen and we really should not take the profiled JVM down due to it.
31-03-2022

Moving from hotspot/runtime -> hotspot/svc since the Serviceability team maintains AsyncGetCallTrace. [~jbachorik] - Sorry, I don't know if anyone is actively assigned to AsyncGetCallTrace() on the Serviceability team. I've been on the Runtime team for quite a while and haven't kept up with AsyncGetCallTrace() issues.
29-03-2022