JDK-8314440 : Performance Testing Plan for Implementation of JEP 450: Compact Object Headers
  • Type: Sub-task
  • Component: performance
  • Sub-Component: hotspot
  • Priority: P4
  • Status: New
  • Resolution: Unresolved
  • Submitted: 2023-08-16
  • Updated: 2024-11-05
Related Reports
Relates :  
Relates :  
Description
This subtask is to specify what performance tests and results are for the implementation of Compact Object Headers.  The description can be replaced with this information.

It may be that we should split out the performance results for the two GC forwarding mechanisms, from the performance numbers as a whole. 
Comments
[~skuksenko] Thank you.
05-11-2024

[~kvn] I did a set of checks. BacktraceBuilder::push() is not inlined as well in mach5 buld and in local builds too. Created an issue: https://bugs.openjdk.org/browse/JDK-8343648
05-11-2024

[~skuksenko] please link JBS issue filed for BacktraceBuilder::push() to this task.
31-10-2024

[~rkennke] 1. No more regressions for -UCOH except pmd 2. "fast-klass" modification fixed ARM regression (from -10% to -2%). 3. The remaining pmd regressions are related to the non-inlined BacktraceBuilder: push (). And that fact has nothing to do with the lilliput itself. It's an issue for a build system. Having this, I think we could make a green light for the Lilliput. The issue with BacktraceBuilder::push() would be resolved independently.
31-10-2024

[~skuksenko] I tried on an Ubuntu LTS box with gcc 13.2.0 and saw no regression between baseline (jdk-24+20) and the PR branch, with UseCompactObjectHeaders turned off.
30-10-2024

[~rkennke] GCC 13.2.0
30-10-2024

[~skuksenko] yeah, in my profiles I see a ton of fill_in_stack_trace(), which calls BacktraceBuilder::push(). What toolchain are you using on those builds that show the regression? Maybe it's only particular gcc versions that have this problem? Are you using tag jdk-24+20 (7a64fbbb) as the baseline? Or have you picked-up some unrelated changes by any chance?
30-10-2024

[~rkennke] Yes. It's definitely an inline issue. I am trying to understand why push wasn't inlined.
30-10-2024

[~skuksenko] No, we have not changed anything there. I also don't see much that is called in the implementation of push() that touches anything that we've changed. *Maybe* it ends up calling oopDesc::klass(), but you already said that fixing that doesn't improve the regression. I see that push() is marked as inline, maybe it's not showing up in the baseline because there it gets inline, and something causes it in the new build to not get inlined?
30-10-2024

[~rkennke] Do you have any changes in the BacktraceBuilder? What I see it's appearing of "BacktraceBuilder::push" in the profile, which is absent in the baseline.
30-10-2024

I am now trying this on a few other machines. [~skuksenko] could you profile this and report here any findings, please?
30-10-2024

Yes please. More information would be helpful especially if this threatens the release.
30-10-2024

[~skuksenko] I also tried dacapo-old:pmd on x86_64: baseline: 1148.59 -UCOH: 1152.39 +UCOH: 1146.94 Which looks like no regression at all to me. Could you share which platforms and configuration you see those regressions on? Otherwise it would be hard to figure out what's going on there.
29-10-2024

[~rkennke] I've seen platforms where there is no regression at all. But on my HW, the regression is what I reported.
29-10-2024

Ok, I just ran dacapo-bach pmd on arm64 (graviton3): baseline: 938.27 -UCOH: 956.69 +UCOH: 949.18 Looks like a slight ~2% regression, but nothing as bad as 10%.
29-10-2024

[~rkennke] There is no regression on Dacapo23. I meant old Dacapo:pmd
29-10-2024

[~skuksenko] good about lusearch and jython. How are you running pmd, though? Because I am running pmd from dacapo-23.11-chopin, 100x in each configuration (baseline, -UCOH, +UCOH), with common options -Xmx1g -Xms1g -XX:+AlwaysPreTouch. I see no regression, like none at all. I will double-check today, and run everything again. But maybe you are doing something different?
29-10-2024

[~rkennke] This (https://github.com/rkennke/jdk/tree/JDK-8305895-v4-fast-klass) helped with lusearch and jython. No more regressions. But it didn't help with pmd.
29-10-2024

[~skuksenko] It would be interesting to check if that regression would go away with this improvement: https://github.com/rkennke/jdk/tree/JDK-8305895-v4-fast-klass
28-10-2024

I've run a set of benchmarks for checking performance regressions when compact headers are turned off (-XX:-UseCompactObjectHeaders). Almost all benchmarks behave well. Only 3 regressions were found: Dacapo:jython - -1% on x64 Dacapo:lusearch - -1.5% on 64 Dacapo:pmd - -10% on Arm and -5% on x64 Only Dacapo:pmd could be interesting. It's under investigation now. to be continued.
28-10-2024

I have measured some GC statistics and added the numbers to: https://wiki.openjdk.org/display/lilliput/Performance+Testing+for+JEP+450%3A+Compact+Object+Headers To me it looks like no significant change vs baseline. It's interesting to note that with +UseCompactObjectHeaders, we have noticably fewer GC cycles, which makes the *total* GC time go down, but that is expected because of the smaller objects.
28-10-2024

I have run SPECjvm, SPECjbb and Renaissance so far (and internal benchmarks and services that I cannot share). This is what I have so far. The doc also has some numbers by [~stuefe]. https://wiki.openjdk.org/display/lilliput/Performance+Testing+for+JEP+450%3A+Compact+Object+Headers Many benchmarks have quite some variance, I wouldn't read too much into <1% regressions. I confirmed the RSA regression with -UCOH, though. It looks like it's caused by oopDesc::klass() checking two flags instead of one (I have a fix, if that is a blocker). Also the Philosophers and ScalaKmeans benchmarks with +UCOH seem to be legit regressions, but I have not yet investigated in detail. I also have GC logs for everything, I will try to extract some GC metrics out of them and add to the wiki when ready.
21-10-2024

[~coleenp] I'm not done with all my testing. I'll post the results here once I'm done. The potential problem with more complex oopDesc::klass(): it is not clear if this is an actual regression or just noisy test. Which in itself probably means noisy test... but I'm still investigating. If it turns out to be a real problem, then I'll create a corresponding new issue to track it.
16-10-2024

[~rkennke] Can you post your performance results here? I heard something about GC pause times because of testing both UseCompactObjectHeaders && UseCompressedClassPointers combination, that I think should be a follow up bug. Can you create that issue and put that information in a new issue, and link to your implementation task issue (JDK-8305895).
16-10-2024

I was sure that I commented here earlier today, in reply to Coleen's question about testing readiness. However, I can't find my comment, nor can I find Coleen's question anymore, so let me write it again. I think the PR (https://github.com/openjdk/jdk/pull/20677) is now in a shape for performance testing. I would only expect cosmetic or obviously-not-perf-impacting changes at this point (unless, of course, if perf-testing finds issues that need to be fixed before intergration). Is this good enough for conducting performance testing? Or does the PR need to be in its absolute final state and ready to be intergrated?
08-10-2024

It is important that the performance validation includes comparisons of GC metrics like pause times, marking times, cycle times, to make sure that we don't have a significant regression that isn't apparent from whole-benchmark results.
03-10-2024