Enhance ZGC to return unused heap memory to the operating system.
ZGC does not currently uncommit and return memory to the operating system, even when that memory has been unused for a long time. This behavior is not optimal for all types of applications and environments, especially those where memory footprint is a concern. For example:
- Container environments where resources are paid by use.
- Environments where an application might be idle for long periods of time and is sharing or competing for resources with many other applications.
- An application might have very different heap space requirements during its execution. For example, the heap needed during start up might be greater than what is needed later during steady state execution.
Other garbage collectors in HotSpot, such as G1 and Shenandoah, provide this capability today, which some categories of users have found very useful. Adding this capability to ZGC would be welcomed by the same set of users.
The ZGC heap consists of a set of heap regions called _ZPages_. Each ZPage is associated with a variable amount of committed heap memory. When ZGC compacts the heap, ZPages are freed up and inserted into a page cache, the _ZPageCache_. ZPages in the page cache are ready to be reused to satisfy new heap allocations, in which case they are removed from the cache. The page cache is critical for performance, as committing and uncommitting memory are expensive operations.
The set of ZPages in the page cache represent the unused parts of the heap that *could* be uncommitted and returned to the operating system. Uncommitting memory can be therefore done by simply evicting a well chosen set of ZPages from the page cache, and uncommitting the memory associated with these pages. The page cache already keeps ZPages in least-recently-used (LRU) order and segregated by size (small, medium, and large), so the mechanics of evicting ZPages and uncommitting memory is relatively straightforward. The challenge lies in designing the policy that decides when it's time to evict a ZPage from the cache.
A simple policy would be to have a timeout or delay value that specifies how long a ZPage can sit in the page cache before it's evicted. This timeout would have some reasonable default value, with a command line option to override it. The Shenandoah GC uses a policy like this, with a default value of 5 minutes and the command line option `-XX:ShenandoahUncommitDelay=<milliseconds>` to override the default.
A policy like the one above might work reasonably well. However, one could also envision more sophisticated policies that don't involve adding new command line options. For example, heuristics that find a suitable timeout value based on GC frequency, or some other data. We will initially deliver a simple timeout policy, with a `-XX:ZUncommitDelay=<seconds>` option, and let a more sophisticated policy (if one is found) come later.
The uncommit capability will be enabled by default. But whatever the policy decides, ZGC should never uncommit memory so that the heap goes below its minimum size (`-Xms`). This mean the uncommit capability is effectively disabled if the JVM is started with a minimum heap size (`-Xms`) that is equal to the maximum heap size (`-Xmx`). The option `-XX:-ZUncommit` will also be provided to explicitly disable this feature.
Finally, ZGC on Linux/x64 uses a tmpfs or hugetlbfs file to back the heap. Uncommitting memory used by these files requires `fallocate(2)` with `FALLOC_FL_PUNCH_HOLE` support, which first appeared in Linux 3.5 (tmpfs) and 4.3 (hugetlbfs). ZGC should continue to work as before when running on older Linux kernels, with the exception that the uncommit capability is disabled.
- One or more jtreg tests that verify the uncommit capability will be developed.
- Existing benchmarks, such as SPECjbb and SPECjvm, will be used to verify that we don't see measurable latency or throughput regressions when using the default policy.