Bug ID: JDK-8133055 Investigate G1 performance on SPL4

Type: Enhancement
Component: hotspot
Sub-Component: gc
Affected Version: 9

Priority: P4
Status: Closed
Resolution: Won't Fix

Submitted: 2015-08-05
Updated: 2024-09-25
Resolved: 2024-09-25

The SPL4 benchmark gets quite a lot lower score with G1 compared to CMS and Parallel.

As mentioned in an earlier comment, G1 has already surpassed the remaining competing collector, Parallel GC, on this benchmark.
25-09-2024
Improvements to G1 over the last few years have allowed G1 to surpass Parallel on this benchmark. CMS has since been removed, so we can close this.
03-10-2023
Recent investigation showed that the issue is now TLAB/region allocation which is slower than other collectors.
13-03-2023
Moving this to 10 and unassigning myself since the throughput remembered sets will most likely not be worked on in the 9 timeframe.
01-03-2016
Running with the throughput rememberset prototype that Thomas made for 8u20 gives the following results: G1: 603502712.102 Parallel: 606237320.038667 CMS: 635201052.252 G1 is on par with Parallel but CMS is still a bit ahead.
31-08-2015
Turning concurrent refinement off helps G1, but Thomas noticed that even if concurrent refinment is turned off there are bugs that make G1 still do refinement work. Running with a build from Thomas that actually allows you to turn refinement off gives this result: G1: 647248382.50375 Parallel: 587957245.01125 CMS: 642418886.4 G1 actually performs better than Parallel and CMS. I think this is partly due to G1 growing the young gen larger. Specifying a young gen size of 152m for all of the GCs renders this result: G1: 615904186.9325 Parallel:662323578.24625 CMS: 648659934.9175 It seems like refinement is the problem for G1.
07-08-2015
Running the SPL4 benchmark on sthdev05.se.oracle.com gives results with high variance, but it is still pretty clear that G1 is behind CMS and Parallel. Taking the average of 16 runs I get these results: G1: 557663953 Parallel: 613034513 CMS: 648719404 Running with G1TraceConcRefinement shows that a lot of manipulation of the refinement threads is going on. Turning refinement off (by setting -XX:G1ConcRefinementYellowZone=9999999 -XX:G1ConcRefinementRedZone=9999999 -XX:G1ConcRefinementGreenZone=9999999) improves the G1 score a bit. The average goes up to: 571037904. There are only a couple of concurrent cycles going on each run, so I doubt that this activity causes the regression. Here are the detailed numbers for each run. As can be seen the variance it large. G1 \| Parallel \| CMS \| G1 – no refine 458941100.99 656563849.04 669466549.98 568121853.81 529315288.44 662377219.57 688389922.71 537069076.64 571167891.13 495654805.03 683299791.90 625728478.07 578508662.45 663409555.97 647778649.44 583916993.11 576662092.83 625535757.92 665535738.28 581140201.54 592241957.51 626229289.62 613532992.25 564643466.93 567735564.95 603358788.34 650633024.74 560238622.32 587603942.57 639823519.33 672878880.43 612792542.9 482146815.55 659454431.98 514385022.44 556288191.87 539395604.53 678828157.13 678345726.55 537741530.06 546801534.02 505577537.25 671309817.01 563659969.4 586763338.50 633590053.80 600852919.80 575473828.78 579647087.27 624538751.18 642070113.32 549426820.74 552464379.52 536768878.75 664352402.82 578151711.91 596702339.78 601694720.06 670768920.69 566597677.3 576525648.64 595146907.07 645909998.18 575615500.15
05-08-2015

Relates :	JDK-8253230 - G1 20% slower than Parallel in JRuby rubykon benchmark
Relates :	JDK-8131668 - Contention on allocating new TLABs constrains throughput on G1
Relates :	JDK-8062128 - Dramatic difference between UseConcMarkSweepGC and UseG1GC
Relates :	JDK-8133051 - Concurrent refinement threads may be activated and deactivated at random