JDK-8133055 : Investigate G1 performance on SPL4
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 9
  • Priority: P4
  • Status: Closed
  • Resolution: Won't Fix
  • Submitted: 2015-08-05
  • Updated: 2024-09-25
  • Resolved: 2024-09-25
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Description
The SPL4 benchmark gets quite a lot lower score with G1 compared to CMS and Parallel.
Comments
As mentioned in an earlier comment, G1 has already surpassed the remaining competing collector, Parallel GC, on this benchmark.
25-09-2024

Improvements to G1 over the last few years have allowed G1 to surpass Parallel on this benchmark. CMS has since been removed, so we can close this.
03-10-2023

Recent investigation showed that the issue is now TLAB/region allocation which is slower than other collectors.
13-03-2023

Moving this to 10 and unassigning myself since the throughput remembered sets will most likely not be worked on in the 9 timeframe.
01-03-2016

Running with the throughput rememberset prototype that Thomas made for 8u20 gives the following results: G1: 603502712.102 Parallel: 606237320.038667 CMS: 635201052.252 G1 is on par with Parallel but CMS is still a bit ahead.
31-08-2015

Turning concurrent refinement off helps G1, but Thomas noticed that even if concurrent refinment is turned off there are bugs that make G1 still do refinement work. Running with a build from Thomas that actually allows you to turn refinement off gives this result: G1: 647248382.50375 Parallel: 587957245.01125 CMS: 642418886.4 G1 actually performs better than Parallel and CMS. I think this is partly due to G1 growing the young gen larger. Specifying a young gen size of 152m for all of the GCs renders this result: G1: 615904186.9325 Parallel:662323578.24625 CMS: 648659934.9175 It seems like refinement is the problem for G1.
07-08-2015

Running the SPL4 benchmark on sthdev05.se.oracle.com gives results with high variance, but it is still pretty clear that G1 is behind CMS and Parallel. Taking the average of 16 runs I get these results: G1: 557663953 Parallel: 613034513 CMS: 648719404 Running with G1TraceConcRefinement shows that a lot of manipulation of the refinement threads is going on. Turning refinement off (by setting -XX:G1ConcRefinementYellowZone=9999999 -XX:G1ConcRefinementRedZone=9999999 -XX:G1ConcRefinementGreenZone=9999999) improves the G1 score a bit. The average goes up to: 571037904. There are only a couple of concurrent cycles going on each run, so I doubt that this activity causes the regression. Here are the detailed numbers for each run. As can be seen the variance it large. G1 | Parallel | CMS | G1 – no refine 458941100.99 656563849.04 669466549.98 568121853.81 529315288.44 662377219.57 688389922.71 537069076.64 571167891.13 495654805.03 683299791.90 625728478.07 578508662.45 663409555.97 647778649.44 583916993.11 576662092.83 625535757.92 665535738.28 581140201.54 592241957.51 626229289.62 613532992.25 564643466.93 567735564.95 603358788.34 650633024.74 560238622.32 587603942.57 639823519.33 672878880.43 612792542.9 482146815.55 659454431.98 514385022.44 556288191.87 539395604.53 678828157.13 678345726.55 537741530.06 546801534.02 505577537.25 671309817.01 563659969.4 586763338.50 633590053.80 600852919.80 575473828.78 579647087.27 624538751.18 642070113.32 549426820.74 552464379.52 536768878.75 664352402.82 578151711.91 596702339.78 601694720.06 670768920.69 566597677.3 576525648.64 595146907.07 645909998.18 575615500.15
05-08-2015