Bug ID: JDK-8172792 JEP 308: Improve Dynamic Number of Thread Sizing for G1

Type: JEP
Component: hotspot
Sub-Component: gc

Priority: P3
Status: Closed
Resolution: Withdrawn

Submitted: 2017-01-13
Updated: 2022-03-04
Resolved: 2022-03-04

Summary
-------

Improve automatic dynamic thread-sizing enabled by -XX:+UseDynamicNumberOfThreads in G1 to improve thread resource usage particularly in situations where using all available threads is a waste.

Goals
-----

Improve the default thread sizing policies of G1 in the areas of:

* sizing the number of threads used during garbage collection pauses, refinement, and marking,

* for reference processing in particular, automatically determine whether parallelism should be enabled, and determine an optimal number of threads to use,

A user should not need to do much more than set the maximum heap size and pause time goal in order to get better thread resource usage than before.

Non-Goals
---------

These new default policies should significantly improve the ease of use of the G1 collector. The resulting policies may still not be ideal for all situations.

The changes are limited to change how resources are used in existing algorithms, but avoid changes in the actual garbage collection algorithms.

We specifically target thread usage in situations where G1 at the moment uses too many threads. Overall performance should not regress, but at the same time improvements are incidental, not required.

Success Metrics
---------------

A user of G1 should automatically benefit from improved heuristics in the areas covered in this JEP. Depending on the applicability of the change, we expect that G1 will exhibit better throughput, less resource usage or better start-up behavior or any combination of that.

In cases where the VM already utilizes all available resources, there should be no long-term difference.

Motivation
----------

We intend to address two of the most typical tuning related issues users have with G1, automatically applying necessary tunings for the number of used threads:

* worker threads for GC operation are always allocated upfront, decreasing startup performance and increasing memory usage particularly for small, short running batch operations.

* current heuristics for thread sizing can only be specified by the user at start-up (e.g. enabling parallelism during reference processing), or are determined once at start-up based on the environment or other user supplied parameter values. None of them are based on metrics gathered from the actual application (e.g. live set size, amount of survivors), they are global static decisions to be made at startup. So these decisions are often sub-optimal for a few or even all phases of the application.

Improving self-tuning of these resources in these situations should increase the out-of-the-box usability of G1.

Description
-----------

The main change in the area of thread management will be that the current number of threads option (`-XX:ParallelGCThreads`, `-XX:ConcGCThreads`) will change their semantics slightly: instead of being the exact number of threads to be used during GC, marking and refinement, they will always, if they do not already, have the meaning of the maximum number of threads G1 will be allowed to use. Also, threads will not be pre-allocated in full at startup by default, but lazily as they are required.

The number of threads will be determined by the number of work items for a particular phase of work. The exact definition of work item will depend on the phase, and the resulting number of threads determined by heuristics (e.g., for the evacuation phase of GC, the number of threads to use for that particular phase will be determined by the expected amount of live data to be evacuated during that phase). Decisions need to be communicated via appropriate channels (e.g., log messages).

For phases such as reference processing, where at the moment the user needs to enable parallelism manually (via `-XX:+ParallelRefProcEnabled`), G1 will automatically decide whether to enable parallelism and, if so, determine the number of threads to use for each phase given the number of work items to process.

No decision has been made on whether to de-allocate thread resources after long-term non-use.

There will be ways to revert to old behavior, i.e., allow static distribution of resources. This can be done via the existing option -XX:-UseDynamicNumberOfGCThreads.

Alternatives
------------

Provide better documentation for users to help them better tune their applications for their environment. This has the disadvantage that the users still need to perform this kind of tuning for every application and every deployment. There are no alternatives for some of the suggested enhancements such as thread tuning dependent on current amount of work, since this level of control is not available at the moment.

Testing
-------

Default performance measurements for any collector should not regress in general. There are no particular platform requirements.

Risks and Assumptions
---------------------

The heuristics we intend to implement originate from discussions with many users and tuning efforts. These reports may have been only from a non-representative subset of use cases, or use cases that are not representative any more. This may ultimately make this work unnecessary. Some of the changes may cause unintended performance regressions due to changes to heuristics. Some of this work requires experimenting with heuristics that may not be successful in the end. We do have ideas for all these that seem plausible, but unforeseen interactions within the garbage collector and between the garbage collector and applications might make them perform very badly. In these cases we intend to simplify the heuristics until eventually the user will need to give more detail.

Withdrawing as this JEP in its current form is too generic; several of the issues mentioned have also been fixed in separate RFEs, and any further specific work can be done in RFEs as needed.

04-03-2022

Relates :	JDK-8066387 - G1GC induces 3s GC pause when JVM is asked to exit
Relates :	JDK-6938732 - Ergonomify (parallel) reference processing
Relates :	JDK-8134303 - Introduce -XX:-G1UseConcRefinement
Relates :	JDK-8151814 - Tune thread usage for concurrent tasks
Relates :	JDK-8159697 - Adaptive G1HeapWastePercent, G1MixedGCLiveThresholdPercent, G1OldCSetRegionThresholdPercent, and G1MixedGCLiveThresholdPercent and G1MixedGCCountTarget
Relates :	JDK-8173398 - limit number of GC worker threads for smaller heaps
Relates :	JDK-8153771 - Introduce MinPLABSize option
Relates :	JDK-8043575 - Dynamically parallelize reference processing work
Relates :	JDK-8076462 - Preserving the referents of concurrent mark work distribution method causes long pause times
Relates :	JDK-7068229 - G1: Dynamically enable MT reference processing for remark pauses
Relates :	JDK-8148359 - Improve concurrent mark thread synchronization on startup
Relates :	JDK-8153225 - G1 creates too many concurrent refinement threads by default