JDK-8253230 : G1 20% slower than Parallel in JRuby rubykon benchmark
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 16
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2020-09-16
  • Updated: 2023-12-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 23
23Unresolved
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Description
On the jruby bug tracker there is a bug report about later JDKs 20% slower than latest (e.g. JDK 14). (https://github.com/jruby/jruby/issues/5789 via https://twitter.com/headius/status/1297992914832769024).

The main reason is the change of the default GC in JDK9; however the difference is abnormally high so reporting it here. The typical observed difference for known outliers is around 10%.

After some tuning, i.e. setting -Xms == -Xmx, using 32M regions, the difference can be tuned a bit to ~13-15% difference.

One suspicion are the barriers as reported by [~shade] (in that bug report):

"Tested with recent JDK 13 EA and multiple collectors. Judging from GC logs, it is heavily-allocating, but fairly young-gc workload. Both Parallel and G1 run very short Young GCs during the run, taking about 1% of total time, which means allocation pressure itself is not the issue here."

Local results:
 #              score    [% of    options
                        baseline]
 1  parallel    17,26    100,0%   -Xmx1500m (oob)
 2  g1            13,64    79,0%    -Xmx1500m (oob)

 3  parallel    17,16    100,0%   -Xmx1500m -Xms1500m -Xmn1000m          
 4  g1            13,99    81,5%    -Xmx1500m -Xms1500m -Xmn1000m
 5  g1            14,36    83,7%    -Xmx1500m -Xms1500m -Xmn1000m (rerun)
 6  g1            15,13    88,2%    -Xmx1500m -Xms1500m -Xmn1000m -XX:G1HeapRegionSize=32m
 7  g1            14,90    86,8%    -Xmx1500m -Xms1500m -Xmn1000m -XX:G1HeapRegionSize=32m (rerun)

 8  parallel    13,81    100,0%   graal -Xmx1500m -Xms1500m -Xmn1000m -XX:G1HeapRegionSize=32m
 9  g1            13,11    94,9%    graal -Xmx1500m -Xms1500m -Xmn1000m -XX:G1HeapRegionSize=32m

The interesting runs are 8 and 9, with graal. Seems like it's slower overall, but it also does not show a big difference (5%) in performance. So potentially there is an issue with C2 optimizations that only kicks in with Parallel GC's (small) barriers.

Some initial playing with -XX:MaxInlineSize and -XX:FreqInlineSize did not yield interesting results.

Reproduction: 
* Download JRuby from https://www.jruby.org/download
* Clone https://github.com/PragTob/rubykon
* Run jruby -Xcompile.invokedynamic=true -J-Xmx1500m benchmark/mcts_avg.rb

JRuby will pick up the VM pointed to by JAVA_HOME; you can check which with "jruby -v".




Comments
Moving to 20. Same reasoning as above.
18-04-2022

Moving to 19. Same reasoning as for the move to 18 above.
23-11-2021

Moving to 18. Most likely to some extent caused by barriers, and there won't be changes here in 17.
31-05-2021