Improve G1 worst-case latencies by making the full GC parallel.
Match the performance of the parallel collector's full GC for all use cases.
The G1 garbage collector was made the default in JDK 9. The previous default, the parallel collector, has a parallel full GC. To minimize the impact for users experiencing full GCs, the G1 full GC should be made parallel as well.
The G1 garbage collector is designed to avoid full collections, but when the concurrent collections can't reclaim memory fast enough a fall back full GC will occur. The current implementation of the full GC for G1 uses a single threaded mark-sweep-compact algorithm. We intend to parallelize the mark-sweep-compact algorithm and use the same number of threads as the Young and Mixed collections do. The number of threads can be controlled by the `-XX:ParallelGCThreads` option, but this will also affect the number of threads used for Young and Mixed collections.
- Full GC time analysis to ensure that the full GC times have improved. Looking at benchmark scores will probably not be good enough since G1 is designed to avoid full GCs.
- Runtime analysis using VTune or Solaris Studio Performance Analyzer to find unnecessary bottlenecks.
Risks and Assumptions
- The work is based on the assumption that nothing in the fundamental design of G1 prevents a parallel full GC.
- The fact that G1 uses regions will most likely lead to more wasted space after a parallel full GC than for a single threaded one.