Bug ID: JDK-8364258 ThreadGroup constant pool serialization is not normalized

JDK-8364258 : ThreadGroup constant pool serialization is not normalized

Type: Bug
Component: hotspot
Sub-Component: jfr
Affected Version: 17,21,25,26

Priority: P2
Status: Resolved
Resolution: Fixed

Submitted: 2025-07-29
Updated: 2025-12-01
Resolved: 2025-08-04

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 17	JDK 21	JDK 25	JDK 26
17.0.17-oracleFixed	21.0.10-oracleFixed	25Fixed	26 b10Fixed

Related Reports

Relates :	JDK-8226511 - Implement JFR Event Streaming
Relates :	JDK-8369692 - JFR: Don't record thread metadata in case jdk.ThreadStart is disabled
Relates :	JDK-8372445 - JFR files are generated in /tmp filling up space despite configured limitations

Description

Here is the technical history related to this issue: as suspected, the problem is introduced with JFR Event Streaming (JDK 14), because there the serialization of thread groups is moved outside of a safepoint. As part of that change, we stopped deleting the instance holding all registered thread groups, which previously, in a preemptive way, cleared out all registered thread groups for an epoch. It cleared out all registered TG entries, including the live ones, to be recreated during the next chunk/epoch. I suspect the thinking was to increase the overall performance by not having threads rebuild the TG entries anew for every chunk. However, a side effect of this, and the bug, is that no TG entries are now removed, not even stale (unloaded) ones. This is therefore also a memory leak. We also introduced a means for the threads themselves to register their metadata, first on thread start, but later also on thread end. Since no normalization scheme was put in place, it means those threads keep writing duplicated information (albeit valid) to the .jfr binary. But the JFR Recorder Thread, writing the initial checkpoint for a chunk, which includes static constants and threads (including thread groups), writes ALL registered and accumulated TG entries (even dead ones). 

For applications with a high churn rate of threads starting and stopping, this can lead to a high number of duplicated jdk.types.ThreadGroup entries, taking up unnecessary space in the .jfr binary file. At an extreme, the constant pool section of the .jfr binary can begin to dominate the entire file, leading to back-to-back file rotations including intensive disk I/O.

A scavenging scheme for clearing dead TG entries and an "is_serialized" scheme should be implemented. Such an "is_serialized" scheme could easily be extended to also cover JavaThreads ("is_serialized(JavaThread) -> is_serialized(ThreadGroups)")

Comments

[jdk21u-fix-request] Approval Request from Johannes Bechberger "This fixes issues with ThreadPools in JFR"
28-10-2025
A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk21u-dev/pull/2373 Date: 2025-10-22 07:27:40 +0000
22-10-2025
A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk21u/pull/475 Date: 2025-10-21 12:25:44 +0000
21-10-2025
Fix request approved for JDK 25.
04-08-2025
A pull request was submitted for review. Branch: jdk25 URL: https://git.openjdk.org/jdk/pull/26618 Date: 2025-08-04 09:52:42 +0000
04-08-2025
Fix Request (25) I want to integrate this fix in JDK 25 under the RDP2 rules. The bug involves a critical regression leading to situations where JFR can cause a system to degrade significantly because of back-to-back file rotations, inducing high file I/O activity. Significant testing, including long-running stress testing, has been run. Overall risk is considered low.
04-08-2025
Changeset: 3bc44979 Branch: master Author: Markus Grönlund <mgronlun@openjdk.org> Date: 2025-08-04 09:42:05 +0000 URL: https://git.openjdk.org/jdk/commit/3bc449797eb59f9770d2a06d260b23b6efd5ff0f
04-08-2025
Here is the technical history related to this issue. As suspected, the problem is introduced with JFR Event Streaming (JDK 14), because there the serialization of thread groups is moved outside of a safepoint. As part of that change, we stopped deleting the instance holding all registered thread groups, which previously, in a preemptive way, cleared out all registered thread groups for an epoch. It cleared out all registered TG entries, including the live ones, to be recreated during the next chunk/epoch. I suspect the thinking was to increase the overall performance by not having threads rebuild the TG entries anew for every chunk. However, a side effect of this, and the bug, is that no TG entries are now removed, not even stale (unloaded) ones. This is therefore also a memory leak. We also introduced a means for the threads themselves to register their metadata, first on thread start, but later also on thread end. Since no normalization scheme was put in place, it means those threads keep writing duplicated information (albeit valid) to the .jfr binary. But the JFR Recorder Thread, writing the initial checkpoint for a chunk, which includes static constants and threads (including thread groups), would write ALL registered and accumulated TG entries (even dead ones). // Original Flight Recorder change set added in 11 had the correct scavenging of thread groups. // All thread groups were deleted as part of rotation/serialization (even live ones), which happened during a safepoint. commit a060be188df894ed5c26fc12fc9e902f9af32bd3 Author: Erik Gahlin <egahlin@openjdk.org> Date: Tue May 15 20:24:34 2018 +0200 8199712: Flight Recorder Co-authored-by: Markus Gronlund <markus.gronlund@oracle.com> Reviewed-by: coleenp, ihse, erikj, dsamersoff, mseledtsov, egahlin, mgronlun +// Write out JfrThreadGroup instance and then delete it +void JfrThreadGroup::serialize(JfrCheckpointWriter& writer) { + ThreadGroupExclusiveAccess lock; + JfrThreadGroup* tg_instance = instance(); + assert(tg_instance != NULL, "invariant"); + ResourceManager<JfrThreadGroup> tg_handle(tg_instance); <<-- destructor + set_instance(NULL); + tg_handle->write_thread_group_entries(writer); +} // Two types to be evaluated during a safepoint (writing threads and thread groups) + // register safepointing type serialization + for (size_t i = 0; i < 2; ++i) { + switch (i) { + case 0: register_serializer(TYPE_THREADGROUP, true, false, new JfrThreadGroupConstant()); break; + case 1: register_serializer(TYPE_THREAD, true, false, new JfrThreadConstantSet()); break; + default: + guarantee(false, "invariant"); + } + } + return true; +} // JFR Event Streaming commit 8addc1418acf6d0cbba7c56429a12be2e1ebf521 (tag: jdk-14+21) Author: Markus Grönlund <mgronlun@openjdk.org> Date: Wed Oct 30 19:43:52 2019 +0100 8226511: Implement JFR Event Streaming Co-authored-by: Erik Gahlin <erik.gahlin@oracle.com> Co-authored-by: Mikhailo Seledtsov <mikhailo.seledtsov@oracle.com> Reviewed-by: egahlin, mseledtsov, mgronlungit -#include "jfr/utilities/jfrResourceManager.hpp" @@ -396,9 +392,7 @@ void JfrThreadGroup::serialize(JfrCheckpointWriter& writer) { ThreadGroupExclusiveAccess lock; JfrThreadGroup* tg_instance = instance(); assert(tg_instance != NULL, "invariant"); - ResourceManager<JfrThreadGroup> tg_handle(tg_instance); <<-- no longer destroys the instance on serialize, which means that also stale ThreadGroups are included over time. - set_instance(NULL); - tg_handle->write_thread_group_entries(writer); + tg_instance->write_thread_group_entries(writer); } JfrThreadGroup::~JfrThreadGroup() { - assert(SafepointSynchronize::is_at_safepoint(), "invariant"); <<--- this existed before JDK 14; much of JFR Event Streaming is redesigning everything to increase the level of concurrency. if (_list != NULL) { for (int i = 0; i < _list->length(); i++) { JfrThreadGroupEntry* e = _list->at(i); @@ -281,14 +280,11 @@ void JfrThreadGroup::set_instance(JfrThreadGroup* new_instance) { } // Write out JfrThreadGroup instance and then delete it <<----------- the comment is still there :-( void JfrThreadGroup::serialize(JfrCheckpointWriter& writer) { ThreadGroupExclusiveAccess lock; JfrThreadGroup* tg_instance = instance(); assert(tg_instance != nullptr, "invariant"); tg_instance->write_thread_group_entries(writer); } + JfrJavaSupport::on_thread_start(t); if (JfrRecorder::is_recording()) { - if (t->is_Java_thread()) { - send_java_thread_start_event((JavaThread)t); + if (!t->jfr_thread_local()->is_excluded()) { + JfrCheckpointManager::write_thread_checkpoint(t); <<--- JFR Event Streaming added a means to have individual threads writes a checkpoint on start + if (t->is_Java_thread()) { + send_java_thread_start_event((JavaThread)t); + } } } } +void JfrTypeManager::write_thread_checkpoint(Thread* t) { + assert(t != NULL, "invariant"); + ResourceMark rm(t); + HandleMark hm(t); + JfrThreadConstant type_thread(t); + JfrCheckpointWriter writer(t, true, THREADS); + writer.write_type(TYPE_THREAD); + type_thread.serialize(writer); } diff --git a/src/hotspot/share/jfr/recorder/checkpoint/types/jfrTypeManager.hpp b/src/hotspot/share/jfr/recorder/checkpoint/types/jfrTypeManager.hpp index 75d073be145..b37471727ca 100644 --- a/src/hotspot/share/jfr/recorder/checkpoint/types/jfrTypeManager.hpp +++ b/src/hotspot/share/jfr/recorder/checkpoint/types/jfrTypeManager.hpp @@ -33,10 +33,12 @@ class JfrTypeManager : public AllStatic { public: static bool initialize(); static void destroy(); - static void write_types(JfrCheckpointWriter& writer); - static void write_safepoint_types(JfrCheckpointWriter& writer); - static void create_thread_blob(JavaThread* jt); - static void write_thread_checkpoint(JavaThread* jt); + static void on_rotation(); + static void write_threads(JfrCheckpointWriter& writer); + static void create_thread_blob(Thread* t); + static void write_thread_checkpoint(Thread* t); <<<----------------- for thread checkpointing + static bool has_new_static_type(); + static void write_static_types(JfrCheckpointWriter& writer); }; // No thread checkpoint call added for thread on_exit() as part of JFR Event Streaming void JfrThreadLocal::on_exit(Thread* t) { assert(t != NULL, "invariant"); JfrThreadLocal * const tl = t->jfr_thread_local(); assert(!tl->is_dead(), "invariant"); - if (t->is_Java_thread()) { - JavaThread* const jt = (JavaThread)t; - ObjectSampleCheckpoint::on_thread_exit(jt); - send_java_thread_end_events(tl->thread_id(), jt); + if (JfrRecorder::is_recording()) { + if (t->is_Java_thread()) { + JavaThread const jt = (JavaThread)t; + ObjectSampleCheckpoint::on_thread_exit(jt); + send_java_thread_end_events(tl->thread_id(), jt); + } } release(tl, Thread::current()); // because it could be that Thread::current() != t } // Yet another write_checkpoint() call on thread exit added after JFR Event Streaming (for 20 and 21) as it is needed to resolve thread entries for events issued before the bulk thread serialization. commit 0ba473489151d74c8a15b75ff4964ac480fecb28 Author: Markus Grönlund <mgronlun@openjdk.org> Date: Fri Dec 16 10:46:37 2022 +0000 8287699: jdk/jfr/api/consumer/TestRecordingFileWrite.java fails with exception: java.lang.Exception: Found event that should not be there. Reviewed-by: egahlin diff --git a/src/hotspot/share/jfr/support/jfrThreadLocal.cpp b/src/hotspot/share/jfr/support/jfrThreadLocal.cpp index 19bbe25798c..ff4d255fc98 100644 --- a/src/hotspot/share/jfr/support/jfrThreadLocal.cpp +++ b/src/hotspot/share/jfr/support/jfrThreadLocal.cpp @@ -208,6 +208,9 @@ void JfrThreadLocal::on_exit(Thread t) { assert(t != NULL, "invariant"); JfrThreadLocal * const tl = t->jfr_thread_local(); assert(!tl->is_dead(), "invariant"); + if (JfrRecorder::is_recording()) { + JfrCheckpointManager::write_checkpoint(t); <<-- Now a thread also writes a checkpoint on thread exit + }
31-07-2025
A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/26558 Date: 2025-07-30 16:08:59 +0000
30-07-2025