JDK-8290736 : Improve scalability of Merge Heap Roots/Merge Log Buffers
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 20
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2022-07-20
  • Updated: 2024-10-11
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Description
Recent scalability testing with G1 showed that some phases do not scale well with number of threads.

Using Bigramtester@20gb showed that on a large machine, with ~30 threads the Merge Heap Roots/Merge Log Buffers phases takes about 1% of gc pause time (~3ms) on average; with >100 threads it takes around 13% already (~14ms) - note that this is the same application with roughly the same amount of cards generated.

This seems to be related to dequeuing buffers from the DCQS. Some testing showed that quadrupling the buffer sizes decreases this time to ~6%.
Comments
Some options for improvements: * drain log buffers, emptying them just before gc (for merge log buffers; may not be feasible if there is a huge amount of these) * dynamically size log buffer size (there are millions of log entries queued with bigramtester); it does not really seem to matter if there are less buffers at that scale * single remset for young gen (+ minimum amount of old gen regions to take), managed on the card table directly so that there are less buffers to merge (less contention) * better data structure more amenable to processing These options are not mutually exclusive of course
20-07-2022