Bug ID: JDK-8248485 Poor scalability in JfrCheckpointManager when using many threads after JDK-8242088

Type: Bug
Component: hotspot
Sub-Component: jfr
Affected Version: 15,16

Priority: P3
Status: Resolved
Resolution: Fixed

Submitted: 2020-06-29
Updated: 2022-02-19
Resolved: 2020-06-30

JDK 15	JDK 16
15 b30Fixed	16Fixed

Although JDK-8242088 improved scalability and performance in general for most subsystems, it had an unfortunate and overlooked side-effect on JfrCheckpointManager:

JDK-8242088 consolidated the mspace's to aggregate a 'free_list' and a 'live_list' (previously they were called 'free_list' and 'full_list') and usages have been streamlined to have the 'free_list' actually be a free list and the 'live_list' to be the active, in-use or live list, instead of the more poorly named 'full_list'.

Before JDK-8242088, JfrCheckpointManager used the free_list for the two statically allocated buffers (512 Kb) and the full_list was used to hold transient allocated buffers (also 512 kb). This meant that a fetch attempt to lease a statically allocated buffer took at most O(2).

With JDK-8242088, the statically allocated buffers are now stored in the live_list. But, the transient allocated buffers are also stored in the live_list.
This has caused the access time to become a function of the number of buffers, indirectly becoming a function of the number of concurrent threads.

On systems with a high number of parallel threads, this becomes problematic.

JDK-8242088 also revealed that JfrCheckpointManager is heavily over-provisioning memory for transient buffers: the size of a transient buffer is the  minimum element size for the JfrCheckpointMspace, which is 512 kb by default. But on buffer release, a transient buffer will be retired, making most of the allocated space unavailable until the next chunk rotation (a flushpoint involving checkpoint data currently only writes contents, but do not deallocate transient buffers, which is post-poned until chunk rotation).

We should address both of these aspects by using two mspaces instead of a single global JfrCheckpointMspace. One mspace is to be specialized for threads and one is to be specialized for the global access. This is similar to the layout of JfrStorage.

synopsis references the wrong bug. 8242088, not 8242008

14-07-2020

Changeset: abc55dea Author: Markus Grönlund <mgronlun@openjdk.org> Date: 2020-06-30 19:00:14 +0000 URL: https://git.openjdk.java.net/mobile/commit/abc55dea

02-07-2020

URL: https://hg.openjdk.java.net/jdk/jdk15/rev/578b4bec06e7 User: mgronlun Date: 2020-06-30 17:06:12 +0000

30-06-2020

Relates :	JDK-8242088 - Replace mutually exclusive lists with concurrent alternatives
Relates :	JDK-8234595 - JfrBuffer::reinitialize failed "assert(!lease()) failed: invariant"
Relates :	JDK-8247965 - Two JFR tests failing in Loom repo