Bug ID: JDK-8343047 G1: FullGC marking time of large object array unstable between builds

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 26
26Unresolved

When looking at the trend-graph for the SystemGC micro OneBigObject I noticed that the results differed a lot between builds, but for a given build the variance was pretty low. At first I thought we had some regression, but seeing how the performance recovered and then go down again suggests that it might be something else. 

In the past we've seen that small changes to how the c++ compiler inline calls in the marking code can lead to big differences in performance and I wonder it this is what's going on here, because there have been no obvious change to the code that is executed from what I can tell. 

This benchmark creates a very large object array 128M references (all null) and then calls System.gc() and looking at detailed GC logs shows that it is the marking time that differs between the runs.

I can reproduce the issue locally, on my machine I have:
24b14:
Benchmark        Mode  Cnt    Score   Error  Units
OneBigObject.gc    ss   25  109.380 ± 1.016  ms/op
24b15:
Benchmark        Mode  Cnt   Score   Error  Units
OneBigObject.gc    ss   25  87.812 ± 1.192  ms/op
24b16:
Benchmark        Mode  Cnt   Score   Error  Units
OneBigObject.gc    ss   25  88.357 ± 1.404  ms/op
24b17:
Benchmark        Mode  Cnt    Score   Error  Units
OneBigObject.gc    ss   25  132.062 ± 1.260  ms/op

My investigations have mainly been on linux-x64.

I’ve spent some more time looking into the details of this. Eric Caspole did some work in a related bug (JDK-8355554) showing that between two build only separated by one change (unrelated to GC) we saw this behavior. He saw the problem with Parallel GC, but when trying to reproduce that I saw it with G1. One interesting thing looking at the results with these two builds (jdk-25+13-1379 and jdk-25+13-1380) was that this time it was the adjust phase that showed the difference in time. Both the adjust phase and the marking phase uses the oop-iteration code, so the suspicion that there could be some problem around inlining in those code paths was still not unreasonable. But now being able to build and run the code locally I was able to use linux perf to dig deeper into the issue and doing so I could see that the assembly for the hot methods was identical. So no inlining problems it seems. The difference that I could spot, was the alignment of the methods. Different alignment could cause changes to how well branch prediction and some other things work, and especially in a target “micro” like this where we spend “all” time in one particular path could really pay a large price if we don’t manage to predict equally good in both cases. The strange thing, is that perf stat doesn’t really show a difference in branch misses, but I wonder if it could be something like us getting more expensive misses in one case compared to the other. I don’t plan to spend more time on this right now, it doesn’t look like we can (in an easy way) force the functions to be aligned a certain way (tried using #pragma GCC optimize ("-falign-functions=...")) and I’m not sure that is desirable to do for a specific micro benchmark either.

16-05-2025