Bug ID: JDK-8284289 JEP 435: Asynchronous Stack Trace VM API

JDK-8284289 : JEP 435: Asynchronous Stack Trace VM API

Type: JEP
Component: hotspot
Sub-Component: svc

Priority: P4
Status: Closed
Resolution: Withdrawn

Submitted: 2022-04-04
Updated: 2024-09-03
Resolved: 2024-09-03

Related Reports

Relates :

JDK-8170152 - WhiteBox testing of pd_get_top_frame_for_profiling

Description

 Summary
-------

Define an efficient and reliable API to collect stack traces asynchronously and include information on both Java and native stack frames.


Goals
-----

- Provide a well-tested API for profilers to obtain information on Java and native frames.

- Support asynchronous, e.g., calling from signal handlers, and synchronous usage

- Do not affect performance when the API is not in use.

- Do not significantly increase memory requirements compared to the existing `AsyncGetCallTrace` API.

Motivation
----------


The `AsyncGetCallTrace` API is used by almost all available profilers, both open-source and commercial, including, e.g.,  [async-profiler](https://github.com/jvm-profiling-tools/async-profiler). Yet it has three major disadvantages:

  - It is an internal API, not exported in any header, and
  - It only returns information about Java frames, namely their method and bytecode indices.
  - It cannot be used to walk collect stack traces in a separate thread, outside a signal handler, to implement JFR-like sampling.

These issues make implementing profilers and related tooling more difficult. Some additional information can be extracted from the HotSpot VM via complex code, but other useful information is hidden and impossible to obtain:

- Whether a compiled Java frame is inlined (currently only obtainable for the topmost compiled frames),
- The compilation level of a Java frame (i.e., compiled by C1 or C2), and
- Information on C/C++ frames that are not at the top of the stack.

Such data can be helpful when profiling and tuning a VM for a given application, and for profiling code that uses JNI heavily.


Description
-----------

We propose a new `AsyncGetStackTrace` API, modeled on the `AsyncGetCallTrace` API:

```
void AsyncGetStackTrace(ASGST_CallTrace *trace, jint depth, void* ucontext, uint32_t options);
```

This API can be called by profilers to obtain the stack trace of a thread, but it does not guarantee to obtain all frames and works on best-effort basis. Its implementation will be at least as stable as `AsyncGetCallTrace` or the JFR stack walking code, due to fuzzing and stability tests in the JDK and extensive safety checks in the implementation itself. The VM fills in information about the frames, the number of frames, and the trace kind. The API can be used safely from a separate thread, which is the recommended usage, but can also be used in a signal handler. You have explicitly tell the API to walk the same thread via the `ASGST_WALK_SAME_THREAD` option, this assumes that the passed ucontext comes always from the same thread. The caller of the API should allocate the `CallTrace` array with sufficient memory for the requested stack depth. Walked threads are required to be halted during stack walking.

Parameters:

- `trace` — buffer for structured data to be filled in by the VM
- `depth` — maximum depth of the call stack trace
- `ucontext` — `ucontext_t` of the thread where the stack walking should start
- `options` — bit set for options

Currently, only the lowest two of the `options` are considered, all other bits are considered to be `0`: 

```
enum ASGST_Options {
 ASGST_INCLUDE_NON_JAVA_FRAMES = 1,
  ASGST_WALK_SAME_THREAD  = 2
};
```

`ASGST_INCLUDE_NON_JAVA_FRAMES` enables the inclusion of non-Java frames, that are otherwise skipped. 
`ASGST_WALK_SAME_THREAD` enables the profiler user to walk the stack for the same thread, i.e. directly in a signal handler), 
this disables protections that are only enabled in separate thread mode.

There are different kinds of traces depending on the purpose of the currently running code in the walked thread:

```
enum ASGST_TRACE_KIND {
 ASGST_JAVA_TRACE = 1
};
```

- ASGST_JAVA_TRACE: A kind for a fully functioning Java thread (which runs Java code)

All other kinds (up to 8 in total, values have to be powers of two), are implementation specific and should
not represent traces that contain Java frames.

The `trace` struct

```
typedef struct {
  JNIEnv *env_id;      // Env where trace was recorded
  jint num_frames;                // number of frames in this trace,
                                  // (< 0 indicates the frame is not walkable).
  uint8_t kind;                   // kind of the trace, if non zero intialized, it is a bit mask for accepted kinds
  jint state;                     // thread state (jvmti->GetThreadState), if non zero initialized,
                                  // it is a bit mask for accepted states, non Java kind traces are always accepted
                                  // and get state -1
  ASGST_CallFrame *frames;        // frames that make up this trace. Callee followed by callers.
  void* frame_info;               // more information on frames
} ASGST_CallTrace;
```

is filled in by the VM. Its `num_frames` field contains the actual number of frames in the `frames` array or an error code. The `frame_info` field in that structure can later be used to store more information, but is currently `nullptr`.

The `kind` and `state` field serve a dual purpose: They are bitmasks for the allowed kinds and states (same as JVMTI GetThreadState) if non-zero and allow profilers to constrain the kinds of obtained traces and states of walked threads. If the walking is aborted because of a mismatching kind or state, then the error code `ASGST_WRONG_KIND` and `ASGST_WRONG_STATE` are set. The `kind` field only contains valid information if no error except the `ASGST_WRONG_KIND` occurred. The `kind` field only contains valid information if no error except the `ASGST_WRONG_STATE` occurred.

The error codes from 0 to -5 are defined as follows:

```
enum ASGST_Error {
  ASGST_NO_JAVA_FRAME         =   0,
  ASGST_THREAD_EXIT           =  -1,   // dying thread
  ASGST_NO_THREAD             = -2,  // related to walking the separate in a separate thread
  ASGST_WRONG_STATE           = -3, // trace not obtained because of wrong state (is not included in the passed allowed states)
  ASGST_WRONG_KIND            = -4, // same but with kind
};
```
All other error codes (< -5) are implementation specific and should be documented by any vendor.

Every `CallFrame` is the element of a union since the information stored for Java and non-Java frames differs:

```
typedef union {
  uint8_t type;     // to distinguish between JavaFrame and NonJavaFrame
  ASGST_JavaFrame java_frame;
  ASGST_NonJavaFrame non_java_frame;
} ASGST_CallFrame;
```

There are several distinguishable frame types:

```
enum ASGST_FrameTypeId {
  ASGST_FRAME_JAVA         = 1, // JIT compiled and interpreted
  ASGST_FRAME_JAVA_INLINED = 2, // inlined JIT compiled
  ASGST_FRAME_JAVA_NATIVE        = 3, // barrier frames between Java and C/C++
  ASGST_FRAME_NON_JAVA            = 4  // C/C++/... frames
};
```

The first two types are for Java frames, for which we store the following information in a struct of type `JavaFrame`:

```
typedef struct {
  uint8_t type;            // frame type
  int8_t comp_level;      // compilation level, 0 is interpreted, -1 is undefined, > 1 is JIT compiled
  uint16_t bci;            // 0 <= bci < 65536, 65535 (= -1) if the bci is >= 65535 or not available (like in native frames)
  ASGST_Method method;
} ASGST_JavaFrame;         // used for FRAME_JAVA, FRAME_JAVA_INLINED and FRAME_JAVA_NATIVE
```

The `comp_level` indicates the compilation level of the method related to the frame, the meaning of this number is implementation specific.

`ASGST_Method` is an implementation-specific id of a method that is distinct from the `jmethodID`. 
There are multiple signal-safe methods to work with the method id:

```
struct ASGST_MethodInfo {
  char* class_name;
  jint class_name_len;
  char* generic_class_name;
  jint generic_class_name_len;
  char* method_name;
  jint method_name_len;
  char* signature;
  jint signature_len;
  char* generic_signature;
  jint generic_signature_len;
  jint modifiers;
};
void ASGST_GetMethodInfo(ASGST_Method method, ASGST_MethodInfo* info);
```
Obtain the method information for a given ASGST_Method and store it in the pre-allocated info struct.
It stores the actual length in the _len fields and at a null-terminated string in the string fields.
It is safe to call from signal handlers. A field set `\0` if the information is not available.

A conversion from `ASGST_Method` to `jmethodID` is available via 
`jmethodID ASGST_MethodToJMethodID(ASGST_Method method);` and
`ASGST_Method jMethodIDToASGST_Method(jmethodID method);`,
but using these methods is not signal-safe.

Obtaining the `jclass` for a given method can be done via
`jclass ASGST_GetClass(ASGST_Method method);`,
but you have to be aware, that this method is not signal-safe and
that the resulting `jclass` pointer has a limited lifetime.

Information on all other frames is stored in `NonJavaFrame` structs:

```
typedef struct {
  uint8_t type;      // frame type
  void *pc;          // current program counter inside this frame, might be a nullptr for JVM internal frames like stub frames, …
} ASGST_NonJavaFrame; // used for FRAME_NON_JAVA
```

Although the API provides more information, the amount of space required per frame (e.g., 16 bytes on x86) is the same as for the existing `AsyncGetCallTrace` API.

We propose to place the above declarations in a new header file, `profile.h`, which will be placed in the `include` directory of the JDK image. The header’s license should include the Classpath Exception so that it is consumable by third-party profiling tools.

The implementation can be found in the [jdk-sandbox](https://github.com/openjdk/jdk-sandbox/tree/asgst) repository, and a demo combining it with a modified async-profiler can be found [here](https://github.com/parttimenerd/asgct2-demo).



Risks and Assumptions
---------------------

Returning information on C/C++ frames leaks implementation details, but this is also true for the Java frames of `AsyncGetCallTrace` since they leak details of the implementation of standard library files and include native wrapper frames.


Testing
-------

The implementation contains several stress and fuzzing tests to identify stability problems on all supported platforms, sampling the [renaissance](https://renaissance.dev/) benchmark suite repeatedly with small profiling intervals (<= 0.1ms). The fuzzing tests check that AsyncGetStackTrace can be called with modified stack and frame pointers without crashing the VM. We also added several tests which cover the basic usage of the API.

Alternatives
---------------
WIP: Provide an iterator-based API that supports walking at safe points and incremental tracing.

Comments

Regarding the last comment: We implemented a few JTREG stability and fuzzing tests, fixing all the bugs that we found (mainly using SafeFetch). We now consider the implementation safe enough without using any CrashProtection, it does not even crash when we modify the ucontext to contain random frame and stack pointers.
20-12-2022
We are still talking about sigjumping out of a crash frame at arbitrary places in the profiled target stack? If so, I'm still against it. My arguments remain the same [1]. Ironically it would make AGCT unsafer, not safer. You would hide crashes caused by AGCT but risk introducing delayed crashes in other subsystems, corrupting data, or even security problems. To me, that is a lot worse than async-profiler directly crashing out. The errors would not be easily attributable to profiling, may not be detected until much later, or not at all. It would probably affect other developers or customers. Its use in JFR is no indication that this technique is safe. To my understanding, it was used as a stop-gap measure to get feature parity with JRockit while on a tight deadline [2]. As Markus explained, its use is somewhat safer in JFR since JFR runs the stack walker in a single dedicated thread instead of at arbitrary times in the target thread [3]. And because the JFR people very carefully groomed their coding to be side effect free. Put it another way, you can use crash protection if you can prove that the stack walking code is completely side effect free, does not change VM state or process state outside of the thread stack extent starting at the AGCT entry point. I don't think you can realistically promise that, especially not over a long period of code maintenance. It would be less effort to make sure you just don't crash in the first place e.g. with SafeFetch. More onery work maybe, but ultimately simpler. Better to test too if you don't hide crashes. [1] https://github.com/openjdk/jdk/pull/8225#issuecomment-1099315336 [2] https://github.com/openjdk/jdk/pull/8225#issuecomment-1099420516 [3] https://github.com/openjdk/jdk/pull/8225#issuecomment-1099391050
04-11-2022
>> Anyway, there was an attempt to generalize the crash handler that is used for JFR in https://github.com/openjdk/jdk/pull/8225 and it was met with quite a fierce resistance > The resistance was against its use for the existing unsupported AGCT mechanism. That's why I am more optimistic about its fate in this JEP :)
03-11-2022
If we can get away with using the crash handler I agree that the user experience would be superior to just being told that the API call can crash the JVM. One might argue that it should be the API callers who would take the responsibility for avoiding the crash - but, IMO, if we would require virtually each API user to create the crash handler the usability might be rather poor. Anyway, there was an attempt to generalize the crash handler that is used for JFR in https://github.com/openjdk/jdk/pull/8225 and it was met with quite a fierce resistance so it is very difficult to predict whether it will be possible to use the crash handler in this implementation or not.
03-11-2022
> Anyway, there was an attempt to generalize the crash handler that is used for JFR in https://github.com/openjdk/jdk/pull/8225 and it was met with quite a fierce resistance The resistance was against its use for the existing unsupported AGCT mechanism.
03-11-2022
The difference between the specific code and the whole API you stresses is not clear to me. How you intend to be as stable as JFR stack walking code and avoid os::ThreadCrashProtection when doing the stack walk by AsyncGetStackTrace? Are you saying that AsyncGetStackTrace is intended to be as stable as JFR/AsyncGetCallTrace unless any signals are raised during the stack walk?
02-11-2022
The JFR stack walking code is wrapped in an os::ThreadCrashProtection wrapper so that segmentation faults in this code do not crash the JVM. We do not propose to wrap the AsyncGetStackTrace stack walking code. I hope this explains why the specific code should be as stable, but not the whole API.
02-11-2022
The following 2 statements look contradictory: "Non-Goals It is not a goal to recommend the new API for production use, since it can crash the VM." vs "Calling this API from a signal handler is safe, and the new implementation will be at least as stable as ... the JFR stack walking code." Can you elaborate on intended use cases and implementation stability expectations?
02-11-2022
Looking at this issue: https://bugs.openjdk.java.net/browse/JDK-8281677 I am wondering if this proposal could report that the last frame resolution was approximated by a distance (PcDesc - resolved_symbol_address). This information could be stored maybe in the CallFrame and could give us confidence index about the accuracy of the resolution. As mentioned in the issue above, there are cases where the last frame maybe not related to actual code executed, because the nearest PcDesc containing debug info is totally unrelated to the actual frame. My assumption here is if the distance between the current PC and the nearest PcDesc is too high (for a definition of high) resolved frame maybe not correct. A profiler can then use this information to warn the user that last frame info maybe be off or the profiler could avoid to show this frame or workaround this with additional/external info.
25-05-2022
Thanks for spotting this mistake, I adopted your `enum FrameTypeId : int8_t { ... }` proposal, but used `uint8_t`, and modified my prototypes accordingly.
11-04-2022
Reading over the definition of `FrameTypeId` and its use in `CallTrace`, `JavaFrame`, and `NonJavaFrame`, I would be careful of the different types used for storing the type of the frame. Given that the default backing type of an enum is `int`, `FrameTypeId` would be stored using 32bits (the size of `int` on most platforms, at least Linux x86/x86_64). But given that `JavaFrame` defines `type` as `int8_t`, I'm not sure what's the impact on the union `CallFrame`. Is there a risk that, for a JavaFrame, the `CallFrame.type` field can point to the union of `JavaFrame.type`, `JavaFrame.comp_level`, and `JavaFrame.bci`? And if it doesn't, then it would mean `JavaFrame` isn't packed. A possible solution for that is to force `FrameTypeId` to be backed by `uint8_t` with `enum FrameTypeId : int8_t { ... }`, or to use `FrameTypeId` everywhere, included for `JavaFrame.type`. A combination of both is also possible.
11-04-2022
I remove the call back as I found it impossible to create an interface compatible with existing libraries.
07-04-2022
The idea is to use the platform unwinder automatically and to only use the call back if this fails. It allows to use the libunwind.
05-04-2022
What is the expectation around use of the platform unwinder on GNU/Linux? The _Unwind_Backtrace interface is not really compatible with the next_frame callback. Will OpenJDK use the platform unwinder automatically, so that the caller does not need to supply a next_frame callback on most distributions?
05-04-2022