Bug ID: JDK-8253230 G1 20% slower than Parallel in JRuby rubykon benchmark

Type: Bug
Component: hotspot
Sub-Component: gc
Affected Version: 16

Priority: P3
Status: Open
Resolution: Unresolved

Submitted: 2020-09-16
Updated: 2025-05-20

JDK 26
26Unresolved

On the jruby bug tracker there is a bug report about later JDKs 20% slower than latest (e.g. JDK 14). (https://github.com/jruby/jruby/issues/5789 via https://twitter.com/headius/status/1297992914832769024).

The main reason is the change of the default GC in JDK9; however the difference is abnormally high so reporting it here. The typical observed difference for known outliers is around 10%.

After some tuning, i.e. setting -Xms == -Xmx, using 32M regions, the difference can be tuned a bit to ~13-15% difference.

One suspicion are the barriers as reported by [~shade] (in that bug report):

"Tested with recent JDK 13 EA and multiple collectors. Judging from GC logs, it is heavily-allocating, but fairly young-gc workload. Both Parallel and G1 run very short Young GCs during the run, taking about 1% of total time, which means allocation pressure itself is not the issue here."

Local results:
 #              score    [% of    options
                        baseline]
 1  parallel    17,26    100,0%   -Xmx1500m (oob)
 2  g1            13,64    79,0%    -Xmx1500m (oob)

 3  parallel    17,16    100,0%   -Xmx1500m -Xms1500m -Xmn1000m          
 4  g1            13,99    81,5%    -Xmx1500m -Xms1500m -Xmn1000m
 5  g1            14,36    83,7%    -Xmx1500m -Xms1500m -Xmn1000m (rerun)
 6  g1            15,13    88,2%    -Xmx1500m -Xms1500m -Xmn1000m -XX:G1HeapRegionSize=32m
 7  g1            14,90    86,8%    -Xmx1500m -Xms1500m -Xmn1000m -XX:G1HeapRegionSize=32m (rerun)

 8  parallel    13,81    100,0%   graal -Xmx1500m -Xms1500m -Xmn1000m -XX:G1HeapRegionSize=32m
 9  g1            13,11    94,9%    graal -Xmx1500m -Xms1500m -Xmn1000m -XX:G1HeapRegionSize=32m

The interesting runs are 8 and 9, with graal. Seems like it's slower overall, but it also does not show a big difference (5%) in performance. So potentially there is an issue with C2 optimizations that only kicks in with Parallel GC's (small) barriers.

Some initial playing with -XX:MaxInlineSize and -XX:FreqInlineSize did not yield interesting results.

Reproduction: 
* Download JRuby from https://www.jruby.org/download
* Clone https://github.com/PragTob/rubykon
* Run jruby -Xcompile.invokedynamic=true -J-Xmx1500m benchmark/mcts_avg.rb

JRuby will pick up the VM pointed to by JAVA_HOME; you can check which with "jruby -v".

Going to be reevaluated with JDK-8340827, and most likely closed, that slipped to 26. Moving to 26.
20-05-2025
Moving to 20. Same reasoning as above.
18-04-2022
Moving to 19. Same reasoning as for the move to 18 above.
23-11-2021
Moving to 18. Most likely to some extent caused by barriers, and there won't be changes here in 17.
31-05-2021

Relates :	JDK-8133055 - Investigate G1 performance on SPL4
Relates :	JDK-8226197 - Reduce G1’s CPU cost with simplified write post-barrier and disabling concurrent refinement
Relates :	JDK-8132937 - G1 compares badly to Parallel GC on throughput on javac benchmark
Relates :	JDK-8340827 - G1: Improve Application Throughput with a More Efficient Write-Barrier
Relates :	JDK-8226731 - Remove StoreLoad in G1 post barrier