JDK-8277919 : OldObjectSample event causing bloat in the class constant pool in JFR recording
  • Type: Bug
  • Component: hotspot
  • Sub-Component: jfr
  • Affected Version: 17,18,19
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2021-11-29
  • Updated: 2023-11-28
  • Resolved: 2021-12-14
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17 JDK 18 JDK 19
17.0.11-oracleFixed 18 b28Fixed 19Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
Allocation heavy applications with OldObjectSample enabled will end up storing many duplicates of the same klass artifact, creating a huge bloat in the class area of the recording constant pool (see the akka_6.jfr.zip attachment).

While this is just annoying when the recording does not have the max size set it becomes a real problem with max size - the class constant pool will push out the useful data eventually (again, see the akka_6.jfr.zip attachment).

Comments
A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk17u-dev/pull/40 Date: 2021-12-29 09:47:12 +0000
29-12-2021

The previous PR was against incorrect repo (jdk17u vs. jdk17u-dev) Here is the correct PR: https://github.com/openjdk/jdk17u-dev/pull/40
29-12-2021

Goetz, why did you remove the 'jdk17u-fix-request' label? Is the process for backports different from what is stated in http://openjdk.java.net/projects/jdk-updates/approval.html. If it is different where can I find the up-to-date instructions? Cheers.
29-12-2021

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk17u/pull/329 Date: 2021-12-27 10:21:48 +0000
27-12-2021

[17u] Fix Request I would like to ask for an approval to backport this fix to JDK 17u. It repairs the OldObjectSample behaviour for allocation heavy applications and prevents the class constant pool bloat pushing out the real information in case the recording size is limited (which it usually is to prevent exhausting FS on production systems). The fix applies cleanly. jdk_jfr tests were run and they are all passing. [Edit 1]: Unfortunately, this change will create intermittent test failures which are addressed in https://bugs.openjdk.java.net/browse/JDK-8278987 - therefore, this backport must be followed by a backport of that fix immediately.
27-12-2021

My understanding, according to http://openjdk.java.net/projects/jdk-updates/approval.html was that if the backport does not require any changes the PR would be created after getting the 'yes' label - since there is really nothing to review and PR is required only for the workflow. Anyway, here is the PR created by the skara /backport command - https://github.com/openjdk/jdk17u/pull/329
27-12-2021

Jaroslav, you need to do a backport PR in jdk17u-dev.
20-12-2021

This is because of JDK-8233705, which can lead to multiple klass entries enqueued via the load barrier, in combination with an insufficient filter mechanism for the leak profiler artefacts.The situation worsened with JDK-8249245, which opened a longer window where artefacts can enqueue. Leak profiler artefacts range over the entire set enqueued in the previous epoch. A filtering mechanism similar to the one used for regular artefacts becomes necessary to avoid duplicates. Note that you need to have stacktraces turned on for the OldObjectSample event, which for the JDK is only done in profile.jfc.
14-12-2021

Changeset: 475ec8e6 Author: Markus Grönlund <mgronlun@openjdk.org> Date: 2021-12-14 13:00:39 +0000 URL: https://git.openjdk.java.net/jdk18/commit/475ec8e6c5abc3431344d69bd46395e8c4b46e4c
14-12-2021

Here are some details of where my investigation took me The issue appears after https://bugs.openjdk.java.net/browse/JDK-8249245 when the artifacts tagging (and adding klasses to the queue to be persisted) is affected by the epoch clear bit (before that change only the epoch bit was taken into account). Following the change it became obvious that the epoch clear bit was not unset for LEAKP tagged artifacts - the bit is supposed to be unset in a 'write' method but the LEAKP tagged artifacts are written out in a specific write method which does not unset the epoch clear bit. This means that once the bit is set it will never be unset for the LEAKP tagged artifacts - meaning that the `should_tag()` function for that artifact will always return true, re-tagging the artifact and re-adding the klass to the queue. In addition to the duplicates stored in the internal queue the write method for the LEAKP tagged artifacts is not checking nor setting the SERIALIZED bit and as such it will happily write out always all the duplicates. This is due to the fact that the `CompositeKlassWriter` for the LEKP tagged artifacts is first executing the `LeakKlassWriter` which matches all artifacts tagged by LEAKP regardless of the SERIALIZED bit and only then it executes `KlassWriter` which will write out anything with LEAKP and SERIALIZED bits unset. A naive fix would be to have the `write__klass__leakp` function (the one writing out LEAKP tagged klasses) check and set the SERIALIZED bit to prevent writing out duplicates. This would still keep the duplicates on the internal queue, though. My first attempt to 'fix' unsetting of the epoch clear bit resulted in infinite loop when iterating over LEAKP tagged klasses :(
29-11-2021