JDK-8258414 : OldObjectSample events too expensive
  • Type: Bug
  • Component: hotspot
  • Sub-Component: jfr
  • Affected Version: 16,17
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2020-12-15
  • Updated: 2021-06-08
  • Resolved: 2021-03-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 15 JDK 16 JDK 17
11.0.12Fixed 15.0.4Resolved 16.0.2Fixed 17 b14Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
We've noticed that we can, in certain cases, get too much overhead from just having the oldobjectsample events enabled (without calculating reference chains). It may be due to the hashcode bug that will be fixed in the next release, but if it isn't, we may want to use the new allocation profiler to decide when to sample an allocation instead.
Comments
Fix request (15u) Requesting backport to 15u for parity with 11u. The patch applies almost cleanly except for some copyright years differences and context changes in jfrAllocationTracer.hpp/jfrAllocationTracer.cpp (JDK-8258094 and JDK-8258404 are not in 15u), reapplied manually. Tested with tier1 and jdk/jfr tests. RFR: http://mail.openjdk.java.net/pipermail/jdk-updates-dev/2021-June/006473.html
07-06-2021

[~sgehwolf] Sorry about that, there were several tickets about the issue and I mixed up them. Yes the right ticket is JDK-8265750 . I'm also updating the latest Fix request because some feedback was made on the mailing list. [11u] Fix Request Mailing list link: https://mail.openjdk.java.net/pipermail/jdk-updates-dev/2021-May/006379.html Bug: https://bugs.openjdk.java.net/browse/JDK-8258414 Webrev: https://cr.openjdk.java.net/~fdavid/8258414/webrev.04/ Original PR: https://github.com/openjdk/jdk/pull/2780
28-05-2021

[~fdavid] I'm confused. Did you mean JDK-8265750, ' Fatal error in safepoint.cpp after backport of 8258414' over JDK-8265718 in your fix request comment perhaps? AFAIK, the backout bug was JDK-8265750.
25-05-2021

[11u] Fix Request Following the ticket 'Build failure after JDK-8258414 11u backport' (https://bugs.openjdk.java.net/browse/JDK-8265718), please find a new version of the backport. Mailing list link: http://mail.openjdk.java.net/pipermail/jdk-updates-dev/2021-May/006091.html Bug: https://bugs.openjdk.java.net/browse/JDK-8258414 webrev: http://cr.openjdk.java.net/~fdavid/8258414/webrev.03/ Original PR: https://github.com/openjdk/jdk/pull/2780
19-05-2021

This seems to be causing problems in JDK 11.0.12 (jdk11u-dev) on Windows. See JDK-8265750. Is JDK 16 and 17 affected too? Edit: From the review of the backout for 11 it seems 16 and 17 aren't affected. JDK-8265750 marked as 11 specific with 16-na, 17-na labels.
23-04-2021

[11u] Fix Request The backport for JDK11u is different from the original PR and thus has been reviewed on jdk-updates-dev mailing list (https://mail.openjdk.java.net/pipermail/jdk-updates-dev/2021-April/005693.html) Testing: - Linux x86_64 tier1 tests - SPECvm 2008 on a 96 cores/192 Gib server. JFR recording size is 75.12 MB before the patch and 3.08 MB after the patch. Webrev: http://cr.openjdk.java.net/~fdavid/8258414/webrev.02/
12-04-2021

[16u] Fix Request The fix corrects the way the old object sample event stacktraces are persisted and reduces the recording size >20x. The patch applies cleanly and jdk_jfr tests are all passing.
09-04-2021

Changeset: a9b156d3 Author: Markus Gr��nlund <mgronlun@openjdk.org> Date: 2021-03-12 10:05:22 +0000 URL: https://git.openjdk.java.net/jdk/commit/a9b156d3
12-03-2021

Our benchmarks run on c5.metal instance (96 cores, 192GB memory) are showing that the hashcode fix is bringing significant improvements: If either TLAB events or OldObjectSample with stacktrace is enabled then - on 11.0.9 the benchmark runs for >900s - on 11.0.10 candidate the benchmark runs for ~50s When either TLAB events or OldObjectSample /w stacktrace are enabled the recording is ~100MB and when we disable them the recording size drops to ~5MB - which corresponds to having all the extra stacktrace constants in the pool.
06-01-2021

Jaroslav mentioned the following finding: "The reason for this behavior is the fact that OldObjectSample will capture stacktrace at each TLAB retirement or Outside-of-TLAB allocation (exactly as the TLAB events do) and all those stacktraces are kept in the constant pool even though the associated events are discarded in the process." Which agrees with Markus comment from 2020-12-16. Capturing the stacktrace only for selected candidates seems like a good idea.
06-01-2021

I think it might make sense to switch to always using the new allocation sampling for deciding the objects to sample for the OldObjectSample event, not only when the new allocation event is enabled.
16-12-2020

Also, are the ILW definitions documented somewhere publicly? I'd like to use them for the JMC project, but every time I ask there are crickets. I did find the priority mapping by looking at JBS issues, and put it here: https://wiki.openjdk.java.net/display/jmc/ILW+Prioritization I could take a shot at documenting the semantics for I, L and W more comprehensively, but I worry that it would differ slightly from Oracle's.
16-12-2020

If the overhead is associated with stacktraces, an option to consider is to move the stacktrace capture to inside the tryLock, to only perform stacktrace capture for a selected candidate. Currently, threads capture their stacktrace before attempting to acquire the lock, this design driven by the fact that TLAB/Outside TLAB events were enabled, with stacktraces, in the profile configuration, hence a stacktrace would be taken eventually. With JDK-8257602, which disabled the TLAB/Outside TLAB events in the profile configuration, this assumption no longer holds. Moving the stacktrace capture to inside the tryLock wlll indirectly subsample stacktrace capture for the selected candidates only, the trade-off is that the tryLock will be held for a longer period, excluding more threads from being sampled.
16-12-2020

In our experience the overhead is caused by the number of stacktraces with the combination with insufficient hash, possibly degenerating a map lookup to sequential search (JDK-8250928).
16-12-2020

By default, stack trace is disabled for the Old Object Sample event. That said, plan is to enable it. Primarily with cheaper stack trace using the stack watermark barrier mechanism introduced in JEP 376: ZGC: Concurrent Thread-Stack Processing. It may also make sense to make Old Object Sample event throttable and if Object Allocation Sample is also enabled, which is by default, reuse that sample point / stack trace.
16-12-2020

[~hirt] Can you clarify the meaning of "overhead" here? Performance overhead for the end user? Too large recording files? How are the OldObjectSamples configured? With stacktraces?
15-12-2020

Thanks for your input Marcus. I have reset the state; we will re-visit this issue during our next triage meeting.
15-12-2020

L-for when using the feature is at least M, the same customers that cannot currently use TLAB/Outside TLAB events also need to disable the OldObjectSample events.
15-12-2020

I'd argue W=H - we can't use them, even without calculating reference chains.
15-12-2020

ILW = MLM = P4
15-12-2020