JDK-8320252 : Regression > 3% in SPECjvm2008-Serial-ParGC on Mac aarch64
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 22
  • Priority: P3
  • Status: Closed
  • Resolution: Not an Issue
  • Submitted: 2023-11-16
  • Updated: 2023-12-18
  • Resolved: 2023-12-18
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 22
22Resolved
Related Reports
Relates :  
Description
This appeared in retriage for 22-b21.
I think there is a related regression on Mac ARM with SPECjvm2008-FFT.large-ParGC but that is still running, will update later.

I did CI build by build runs, this problem is related to JDK-8310031.
Comments
Based the analysis above, this regression is expected (due to reduced footprint). One can get back the original performance by setting NewSize explicitly.
18-12-2023

Took a deeper look at the AdaptiveSizePolicy, which includes latency-, throughput- and footprint- goals. In this bm, almost all young-gc pauses are extremely short, ~1ms, so only the latter two goals are relevent. At startup, default young-capacity is rather small, so mutator-vs-gc time is too low. In order to satisfy throughput-goal, young-capacity increases to 1360M (old-code) and 1160M (new-code), respectively. Later on, footprint-goal kicks in to reduce young-capacity. Since the starting points are diff, the final young-cacacity differs as well, 322M and 218M, respectively. I ran fixed young-capacity (NewSize = MaxNewSize = 400M) to compare young-pause, and saw that there's ~40% reduction on macosx-aarch64. Since the new-code has shorter young-pause, in order to maintain the same mutator-vs-gc time, old-code needs larger young-capacity, which explains the "extra" 200M required in old-code. (Therefore, my original suspicion that new-code has longer/slower young-pause is wrong...) (I also ran fixed young-capacity setup on linux-x64, and there is only ~6% reduction. Therefore, it's not surprising that the regression is not visible on linux-x64 -- similar young-pause results into similar final young-capacity.)
18-12-2023

Tried to relax the MM ordering to `release` for the newly introduced `Atomic::dec`, but the perf result stays the same.
14-12-2023

According to the logs, the regression case has more gc-cycles, due to slightly smaller young-gen size. young-gc pauses are ~0.5ms (extremely small). I suspect the atomic ops introduced by JDK-8310031 are rather expensive on this platform, so AdaptiveSizePolicy decides to shrink young-gen size a bit. Had a few runs of before & after JDK-8310031. config 1: before -- baseline config 2: after -- -3% regression config 3: before + 400M NewSize -- 3% improvement config 4: after + 400M NewSize -- 3% improvement Therefore, this bm is quite sensitive to young-gen size. One easy workaround for this regression is to set NewSize explicitly.
12-12-2023