JDK-8309862 : Unsafe list operations in JfrStringPool
  • Type: Bug
  • Component: hotspot
  • Sub-Component: jfr
  • Affected Version: 17,21,22
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2023-06-12
  • Updated: 2024-01-05
  • Resolved: 2023-06-13
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17 JDK 21 JDK 22
17.0.9-oracleFixed 21Fixed 22 b02Fixed
Related Reports
Relates :  
Relates :  
Description
JDK-8233705 : Let artifact iteration running time be a function of incrementally tagged artifacts

was a follow-up enhancement for improved performance and scalability for JDK-8226511: Implement JFR Event Streaming.

With JDK-8226511, the Jfr Recorder Thread collects artifacts every second instead of only at chunk rotation. As such, it became crucial that the JFR subsystems support concurrent access.

JDK-8233705 extended the concept of a JfrMemorySpace, a JFR memory abstraction area, to make it "epoch-aware". An epoch-aware mspace selects free and full lists as a function of the current epoch. An epoch-aware mspace lets the JFR Recorder Thread have exclusive access to the lists used in the previous epoch while the threads continue to operate on the current epoch lists.

JDK-8233705 updated the JfrCheckpointManager's mspaces to become epoch-aware. A flush operation, performed by the JFR Recorder Thread every second, operates on the current live lists concurrently with other threads. As such, it must not make unsafe changes to this list, as other threads are iterating it. It only writes data from the list during flush and postpones destructive changes until after the epoch shift. This invariant allows the JFR Recorder Thread exclusive access to the previous epoch lists, letting it issue destructive operations exclusively.

JDK-8233705 also updated the operation types used in JfrStringPool to mirror JfrCheckpointManager. Unfortunately, those operation types only work correctly with an epoch-aware mspace because it builds on the invariant of exclusive list access for the JFR Recorder Thread, and the JfrStringPool mspace was not made epoch-aware.

Typically, this is not a problem because the string pool pressure must be so high as to fill more than 1 Mb of strings, each having a max size of 128 bytes, approximately 8192 unique event strings, within a 1-second window. Should that occur, the system will dynamically allocate new memory for the mspace, in what is called a "transient" buffer.

The bug is that the JFR Recorder Thread excises and deletes transient buffers in the JfrStringPool from the current epoch live lists, where this is only an allowed operation on previous epoch lists (for mutual exclusion). Another thread could be iterating the live list while the JFR Recorder Thread removes and deletes a node.

This fix fills in the missing parts done to JfrCheckpointManager as part of JDK-8233705 but was omitted for JfrStringPool. Most importantly, it makes the underlying mspace epoch-aware, splits the write operation into a flush operation for current epoch lists (i.e. flush performs no destructive operations to the lists), and will let write and clear operate on the correct, previous epoch lists, exclusively where it can issue destructive operations.

In addition, it includes better memory reuse of JfrStringPool transient buffers because they always accommodate at least 512 kb of size. Previously, the buffer was retired immediately, even though there might be a lot of free space left. Now the transient buffers are not retired immediately but only when full, in the same manner as for "regular" preallocated buffers.
Comments
True. The testing looks clean. I'll wait a bit to see if there are any follow-up issues emerge.
15-06-2023

[17u] It's brand new in head, but let's hope it proves itself until 17.0.9 is delivered.
15-06-2023

Fix Request (17u) Fixes the important bug in JFR, and matches 17.0.9-oracle. Does not apply cleanly due to contextual differences, but bots think the backport is clean after the merge. JFR tests pass.
14-06-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk17u-dev/pull/1439 Date: 2023-06-14 14:50:46 +0000
14-06-2023

Changeset: 05f896a1 Author: Markus Grönlund <mgronlun@openjdk.org> Date: 2023-06-13 11:47:47 +0000 URL: https://git.openjdk.org/jdk/commit/05f896a153ee950b21bae251d2870a8adfe4f04a
13-06-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk21/pull/12 Date: 2023-06-13 11:52:29 +0000
13-06-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/14426 Date: 2023-06-12 19:07:09 +0000
12-06-2023