Bug ID: JDK-8332485 SPECjbb2005-ParGC regression after JDK-8328744

Type: Bug
Component: hotspot
Sub-Component: gc
Affected Version: 23

Priority: P3
Status: Closed
Resolution: Won't Fix
OS: os_x
CPU: aarch64

Submitted: 2024-05-17
Updated: 2025-04-24
Resolved: 2024-08-26

With the introduction of JDK-8328744... SPECjbb2005-ParGC is showing a  regression that holds into later builds (graph attached):

-1.3% SPECjbb2005-ParGC on macOS-aarch64

It also appears to impact the benchmark on macOS-x64 and linux-aarch64 -- though the statistical measurements are less clear.

The regression was isolated to jdk-23+16-1254 (which only contains JDK-8328744) - (graph attached).

While testing the prototype of JDK-8338977, I notice that the regression has already been fixed by JDK24. The gc-log shows that heap/old-gen capacity is back to the level of before JDK-8328744. Therefore, the regression tracked by this ticket has been resolved.
24-04-2025
JDK-8338977 aims to restore the out-of-the-box performance.
26-08-2024
Expected due to trade-off in JDK-8328744. Can be worked around by tuning young-gen/heap sizes.
31-05-2024
Thanks for the explanation, it makes sense.
31-05-2024
> should it reduce young GC counts? Since young-gen capacity is almost the same, #young-gc is also the same. The total #gc is reduced, but #young-gc accounts for ~99% of all gcs. > large heap should reduce total GC time, no? That depends on the heap content. In this case, the increased old-generation usage is primarily for accommodating floating garbage, as the revised heuristic prefers expansion (young GC) over full GC. While a single young GC is indeed shorter than a single full GC, the revised heuristic in this benchmark results in longer young GCs. Therefore, it is unclear whether the benefit of avoiding full GCs outweighs the accumulated slightly longer young GCs. In other words, JDK-8328744 introduces a trade-off, which can have different impacts on different benchmarks -- it is not an optimization across the board.
29-05-2024
It is a bit counter intuitive. Large heap capacity actually causes SPECjbb score to drop? I definitely can see that each young GC now takes longer, but should it reduce young GC counts? In other words, large heap should reduce total GC time, no?
29-05-2024
I can reproduce this ~1.3% (fast vs slow) perf regression on master. After taking a closer look at the GC-logs, I notice some diff in heap capacity: 1.7G (fast) vs 2.3G (slow), which is caused by diff in old-gen. Therefore, the regression is probably not due to too eager heap-shrink (tracked by JDK-8332531); instead, the larger old-gen increases young-gc pause time by ~5% (since scanning dirty-cards is proportional to old-gen size). JDK-8328744 makes Full-gc less likely, 9 (fast) vs 2 (slow), so old-gen resizing, occurs only after a full-gc, becomes less likely as well, causing larger old-gen. Tuning young/old/heap sizes should get the original perf back. (According to the RAM specs from Robert, mac-aarch64 boxes have smaller RAM, resulting into smaller heaps, so even a slight increase in old-gen/heap capacity can affect the bm score, explaining why regression is less visible in other kinds of boxes.) Based on the above analysis, I tend to think this ticket can be closed as not-an-issue.
27-05-2024
re: specjbb2015 download... Unfortunately, it's a licensed benchmark. Can't download without a license. The Java execute was: java -server -XX:+UseParallelGC -XX:+PerfDataSaveToFile -Xlog:gc* -classpath ./jbb.jar:./check.jar spec.jbb.JBBmain -propfile "/Users/aurora/sandbox/refworkload/benchscripts/specjbb2005/SPECjbb.props" I'll try to attach the GS log for that run to this report. Without SPECjbb2015, the 'SPECjbb.props' is likely meaningless, but for the record - it contained: input.expected_peak_warehouse=16 input.jvm_instances=1 input.per_jvm_warehouse_rampup=3 input.per_jvm_warehouse_rampdown=20 input.sequence_of_number_of_warehouses=1 4 6 8 10 12 14 16 input.show_warehouse_detail=false input.include_file=SPECjbb_config.props input.output_directory=results input.suite=SPECjbb input.log_level=INFO input.deterministic_random_seed=false input.ramp_up_seconds=30 input.measurement_seconds=240 Since the heap in these runs is ergo sized based on physical memory, here's the machine configs: linux-x64: OCI BM.Optimized3.36 (Oracle X9-2) Oracle Linux 8.4 2x Ice Lake-SP processors [3.0 GHz Xeon Gold 6354, Max Turbo 3.6 GHz, 60 MiB L3] 18 cores/processor, Hyperthreaded 72 total processor threads 512 GB Memory linux-aarch64: OCI BM.Standard.A1.160 Oracle Linux 8.4 2x Ampere Altra Quicksilver processors [3.0 GHz Neoverse-N1, 32 MiB System Cache, 1024 KiB L2] 80 cores/processor 160 total processor threads 1024 GB Memory macOS-x64: Mac mini (2018) macOS Monterey (12.6) 1x Coffee Lake-B processor [3.0 GHz Intel Core i5-8500B] 6 cores/processor, 1 threads/core. 9 MB L3 6 total processor threads 32 GB 2666 MHz DDR4 memory macOS-aarch64: Mac mini (M1, 2020) macOS Monterey (12.6) 1x Apple M1 [3.2 GHz] 4 performance cores 4 efficiency cores 16 GB 4,266 MT/s LPDDR4 memory
21-05-2024
Experimenting with java -Xms256m -Xmx1G -XX:+UseParallelGC -Xlog:gc -Xlog:os+map=trace -cp . TestGCOld 40 1 20 10 200000 Post JDK-8328744 does show much more heap resizings, 1145 vs 67 after backout of JDK-8328744.
21-05-2024
Unfortunately, I don't have SPECjbb2005. Any pointer that I can download it? I suspect it may relate to JDK-8332531. With JDK-8328744 , we expand heap more aggressively. In our case, we saw less compact GCs after JDK-8328744. But we turned off resizing by setting -Xms = -Xmx anyway. I wonder what gc logging you have for the runs? If it is possible to share?
21-05-2024
No additional flags other than specifying Parallel GC, according to the logs.
21-05-2024
Could you share the GC parameters for the runs? Thanks.
21-05-2024

Relates :	JDK-8328744 - Parallel: Parallel GC throws OOM before heap is fully expanded
Relates :	JDK-8332531 - Parallel: Parellel GC resizes heap too aggressive