JDK-8332485 : SPECjbb2005-ParGC regression after JDK-8328744
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 23
  • Priority: P3
  • Status: Closed
  • Resolution: Won't Fix
  • OS: os_x
  • CPU: aarch64
  • Submitted: 2024-05-17
  • Updated: 2025-04-24
  • Resolved: 2024-08-26
Related Reports
Relates :  
Relates :  
Description
With the introduction of JDK-8328744... SPECjbb2005-ParGC is showing a  regression that holds into later builds (graph attached):

-1.3% SPECjbb2005-ParGC on macOS-aarch64

It also appears to impact the benchmark on macOS-x64 and linux-aarch64 -- though the statistical measurements are less clear.

The regression was isolated to jdk-23+16-1254 (which only contains JDK-8328744) - (graph attached).

Comments
While testing the prototype of JDK-8338977, I notice that the regression has already been fixed by JDK24. The gc-log shows that heap/old-gen capacity is back to the level of before JDK-8328744. Therefore, the regression tracked by this ticket has been resolved.
24-04-2025

JDK-8338977 aims to restore the out-of-the-box performance.
26-08-2024

Expected due to trade-off in JDK-8328744. Can be worked around by tuning young-gen/heap sizes.
31-05-2024

Thanks for the explanation, it makes sense.
31-05-2024

> should it reduce young GC counts? Since young-gen capacity is almost the same, #young-gc is also the same. The total #gc is reduced, but #young-gc accounts for ~99% of all gcs. > large heap should reduce total GC time, no? That depends on the heap content. In this case, the increased old-generation usage is primarily for accommodating floating garbage, as the revised heuristic prefers expansion (young GC) over full GC. While a single young GC is indeed shorter than a single full GC, the revised heuristic in this benchmark results in longer young GCs. Therefore, it is unclear whether the benefit of avoiding full GCs outweighs the accumulated slightly longer young GCs. In other words, JDK-8328744 introduces a trade-off, which can have different impacts on different benchmarks -- it is not an optimization across the board.
29-05-2024

It is a bit counter intuitive. Large heap capacity actually causes SPECjbb score to drop? I definitely can see that each young GC now takes longer, but should it reduce young GC counts? In other words, large heap should reduce total GC time, no?
29-05-2024

I can reproduce this ~1.3% (fast vs slow) perf regression on master. After taking a closer look at the GC-logs, I notice some diff in heap capacity: 1.7G (fast) vs 2.3G (slow), which is caused by diff in old-gen. Therefore, the regression is probably not due to too eager heap-shrink (tracked by JDK-8332531); instead, the larger old-gen increases young-gc pause time by ~5% (since scanning dirty-cards is proportional to old-gen size). JDK-8328744 makes Full-gc less likely, 9 (fast) vs 2 (slow), so old-gen resizing, occurs only after a full-gc, becomes less likely as well, causing larger old-gen. Tuning young/old/heap sizes should get the original perf back. (According to the RAM specs from Robert, mac-aarch64 boxes have smaller RAM, resulting into smaller heaps, so even a slight increase in old-gen/heap capacity can affect the bm score, explaining why regression is less visible in other kinds of boxes.) Based on the above analysis, I tend to think this ticket can be closed as not-an-issue.
27-05-2024

re: specjbb2015 download... Unfortunately, it's a licensed benchmark. Can't download without a license. The Java execute was: java -server -XX:+UseParallelGC -XX:+PerfDataSaveToFile -Xlog:gc* -classpath ./jbb.jar:./check.jar spec.jbb.JBBmain -propfile "/Users/aurora/sandbox/refworkload/benchscripts/specjbb2005/SPECjbb.props" I'll try to attach the GS log for that run to this report. Without SPECjbb2015, the 'SPECjbb.props' is likely meaningless, but for the record - it contained: input.expected_peak_warehouse=16 input.jvm_instances=1 input.per_jvm_warehouse_rampup=3 input.per_jvm_warehouse_rampdown=20 input.sequence_of_number_of_warehouses=1 4 6 8 10 12 14 16 input.show_warehouse_detail=false input.include_file=SPECjbb_config.props input.output_directory=results input.suite=SPECjbb input.log_level=INFO input.deterministic_random_seed=false input.ramp_up_seconds=30 input.measurement_seconds=240 Since the heap in these runs is ergo sized based on physical memory, here's the machine configs: linux-x64: OCI BM.Optimized3.36 (Oracle X9-2) Oracle Linux 8.4 2x Ice Lake-SP processors [3.0 GHz Xeon Gold 6354, Max Turbo 3.6 GHz, 60 MiB L3] 18 cores/processor, Hyperthreaded 72 total processor threads 512 GB Memory linux-aarch64: OCI BM.Standard.A1.160 Oracle Linux 8.4 2x Ampere Altra Quicksilver processors [3.0 GHz Neoverse-N1, 32 MiB System Cache, 1024 KiB L2] 80 cores/processor 160 total processor threads 1024 GB Memory macOS-x64: Mac mini (2018) macOS Monterey (12.6) 1x Coffee Lake-B processor [3.0 GHz Intel Core i5-8500B] 6 cores/processor, 1 threads/core. 9 MB L3 6 total processor threads 32 GB 2666 MHz DDR4 memory macOS-aarch64: Mac mini (M1, 2020) macOS Monterey (12.6) 1x Apple M1 [3.2 GHz] 4 performance cores 4 efficiency cores 16 GB 4,266 MT/s LPDDR4 memory
21-05-2024

Experimenting with java -Xms256m -Xmx1G -XX:+UseParallelGC -Xlog:gc -Xlog:os+map=trace -cp . TestGCOld 40 1 20 10 200000 Post JDK-8328744 does show much more heap resizings, 1145 vs 67 after backout of JDK-8328744.
21-05-2024

Unfortunately, I don't have SPECjbb2005. Any pointer that I can download it? I suspect it may relate to JDK-8332531. With JDK-8328744 , we expand heap more aggressively. In our case, we saw less compact GCs after JDK-8328744. But we turned off resizing by setting -Xms = -Xmx anyway. I wonder what gc logging you have for the runs? If it is possible to share?
21-05-2024

No additional flags other than specifying Parallel GC, according to the logs.
21-05-2024

Could you share the GC parameters for the runs? Thanks.
21-05-2024