The SPL4 benchmark gets quite a lot lower score with G1 compared to CMS and Parallel.
Comments
As mentioned in an earlier comment, G1 has already surpassed the remaining competing collector, Parallel GC, on this benchmark.
25-09-2024
Improvements to G1 over the last few years have allowed G1 to surpass Parallel on this benchmark. CMS has since been removed, so we can close this.
03-10-2023
Recent investigation showed that the issue is now TLAB/region allocation which is slower than other collectors.
13-03-2023
Moving this to 10 and unassigning myself since the throughput remembered sets will most likely not be worked on in the 9 timeframe.
01-03-2016
Running with the throughput rememberset prototype that Thomas made for 8u20 gives the following results:
G1: 603502712.102
Parallel: 606237320.038667
CMS: 635201052.252
G1 is on par with Parallel but CMS is still a bit ahead.
31-08-2015
Turning concurrent refinement off helps G1, but Thomas noticed that even if concurrent refinment is turned off there are bugs that make G1 still do refinement work.
Running with a build from Thomas that actually allows you to turn refinement off gives this result:
G1: 647248382.50375
Parallel: 587957245.01125
CMS: 642418886.4
G1 actually performs better than Parallel and CMS. I think this is partly due to G1 growing the young gen larger. Specifying a young gen size of 152m for all of the GCs renders this result:
G1: 615904186.9325
Parallel:662323578.24625
CMS: 648659934.9175
It seems like refinement is the problem for G1.
07-08-2015
Running the SPL4 benchmark on sthdev05.se.oracle.com gives results with high variance, but it is still pretty clear that G1 is behind CMS and Parallel. Taking the average of 16 runs I get these results:
G1: 557663953
Parallel: 613034513
CMS: 648719404
Running with G1TraceConcRefinement shows that a lot of manipulation of the refinement threads is going on. Turning refinement off (by setting -XX:G1ConcRefinementYellowZone=9999999 -XX:G1ConcRefinementRedZone=9999999 -XX:G1ConcRefinementGreenZone=9999999) improves the G1 score a bit. The average goes up to: 571037904.
There are only a couple of concurrent cycles going on each run, so I doubt that this activity causes the regression.
Here are the detailed numbers for each run. As can be seen the variance it large.
G1 | Parallel | CMS | G1 – no refine
458941100.99 656563849.04 669466549.98 568121853.81
529315288.44 662377219.57 688389922.71 537069076.64
571167891.13 495654805.03 683299791.90 625728478.07
578508662.45 663409555.97 647778649.44 583916993.11
576662092.83 625535757.92 665535738.28 581140201.54
592241957.51 626229289.62 613532992.25 564643466.93
567735564.95 603358788.34 650633024.74 560238622.32
587603942.57 639823519.33 672878880.43 612792542.9
482146815.55 659454431.98 514385022.44 556288191.87
539395604.53 678828157.13 678345726.55 537741530.06
546801534.02 505577537.25 671309817.01 563659969.4
586763338.50 633590053.80 600852919.80 575473828.78
579647087.27 624538751.18 642070113.32 549426820.74
552464379.52 536768878.75 664352402.82 578151711.91
596702339.78 601694720.06 670768920.69 566597677.3
576525648.64 595146907.07 645909998.18 575615500.15