JDK-8226913 : Scale cards per chunk used during heap root scanning with region size
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 14
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2019-06-27
  • Updated: 2019-08-06
  • Resolved: 2019-07-24
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 14
14 b07Fixed
Related Reports
Blocks :  
Blocks :  
Relates :  
Description
Running G1 with JDK-8213108 on huge heaps shows unusually long unaccounted Merge Heap Roots times.

E.g.

[3322.457s][info ][gc,phases] GC(73)   Merge Heap Roots: 15.4ms
[3322.457s][debug][gc,phases] GC(73)     Remembered Sets (ms):     Min:  0.1, Avg:  0.3, Max:  0.4, Diff:  0.3, Sum:  8.5, Workers: 32
[3322.457s][debug][gc,phases] GC(73)       Merged Sparse:            Min: 0, Avg:  0.0, Max: 0, Diff: 0, Sum: 0, Workers: 32
[3322.457s][debug][gc,phases] GC(73)       Merged Fine:              Min: 0, Avg:  0.0, Max: 0, Diff: 0, Sum: 0, Workers: 32
[3322.457s][debug][gc,phases] GC(73)       Merged Coarse:            Min: 0, Avg:  0.0, Max: 0, Diff: 0, Sum: 0, Workers: 32
[3322.457s][debug][gc,phases] GC(73)     Hot Card Cache (ms):      Min:  0.0, Avg:  0.0, Max:  0.0, Diff:  0.0, Sum:  0.0, Workers: 32
[3322.457s][debug][gc,phases] GC(73)     Log Buffers (ms):         Min:  2.2, Avg:  2.3, Max:  2.4, Diff:  0.2, Sum: 72.9, Workers: 32
[3322.457s][debug][gc,phases] GC(73)       Processed Buffers:        Min: 18, Avg: 55.1, Max: 75, Diff: 57, Sum: 1762, Workers: 32
[3322.457s][debug][gc,phases] GC(73)       Dirty Cards:              Min: 4608, Avg: 12754.0, Max: 16943, Diff: 12335, Sum: 408127, Workers: 32
[3322.457s][debug][gc,phases] GC(73)       Skipped Cards:            Min: 0, Avg:  0.0, Max: 0, Diff: 0, Sum: 0, Workers: 32

The separate phases account for only a small part of total time taken. Investigate where the time is spent.

This is on a heap with 150GB @ 32M with 4800 regions.

Most likely the setup of some internal data structures (there is some 2.5MB of data to clear; probably it would be good to increase the G1RemSetScanState::CardsPerChunk constant for large heaps/region sizes; it would be doable because scanning indicates ~17k chunks total scanned) but also starting up/shutting down threads in the parallel part may be an issue (potentially too many threads started).
Comments
URL: https://hg.openjdk.java.net/jdk/jdk/rev/36e4e50b4255 User: tschatzl Date: 2019-07-24 09:49:55 +0000
24-07-2019

The gc+remset=debug output seems mostly responsible for the long merge heap root times (if enabled).
01-07-2019

Potentially caused by some logging added in these runs, as in other runs on the same machine without some particular logging it does not show up; i.e. the time for starting/stopping threads and clearing the merge heap roots data structures is small (still noticeable, see below) but not as extreme: [3062.907s][error][gc,start ] GC(66) Setup Merge Heap Roots [3062.908s][error][gc ] GC(66) Setup Merge Heap Roots 0.458ms [3062.908s][error][gc,start ] GC(66) Merge Heap Roots [3062.910s][error][gc ] GC(66) Merge Heap Roots 2.368ms [3062.910s][error][gc,start ] GC(66) Merge Heap Roots Print [3062.910s][error][gc ] GC(66) Merge Heap Roots Print 0.013ms [...] [3063.023s][info ][gc,phases] GC(66) Merge Heap Roots: 2.9ms [3063.023s][debug][gc,phases] GC(66) Remembered Sets (ms): Min: 0.1, Avg: 0.3, Max: 0.4, Diff: 0.3, Sum: 9.0, Workers: 32 [3063.023s][debug][gc,phases] GC(66) Merged Sparse: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0, Workers: 32 [3063.023s][debug][gc,phases] GC(66) Merged Fine: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0, Workers: 32 [3063.023s][debug][gc,phases] GC(66) Merged Coarse: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0, Workers: 32 [3063.023s][debug][gc,phases] GC(66) Hot Card Cache (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0, Workers: 32 [3063.023s][debug][gc,phases] GC(66) Log Buffers (ms): Min: 1.9, Avg: 2.0, Max: 2.1, Diff: 0.2, Sum: 63.3, Workers: 32 [3063.023s][debug][gc,phases] GC(66) Processed Buffers: Min: 19, Avg: 55.2, Max: 71, Diff: 52, Sum: 1767, Workers: 32 [3063.023s][debug][gc,phases] GC(66) Dirty Cards: Min: 4864, Avg: 12805.1, Max: 16885, Diff: 12021, Sum: 409764, Workers: 32 [3063.023s][debug][gc,phases] GC(66) Skipped Cards: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0, Workers: 32 I.e. setup ("Setup Merge Heap Roots" - clearing of internal data structures) takes ~20% of merge heap roots time which is still bad (but compared to ~125ms total pause), but not catastrophic
28-06-2019

With e.g. 2k cards/chunk (corresponding to 1M heap, 32 chunks/region) the setup time gets below 0.1ms for such a heap with apparently no perf degradation (needs verification). Suggested fix: scale G1RemSetScanState::CardsPerRegion constant with region/heap size. Also add log output for that part of the Merge Heap Roots phase. The default of 128 cards/chunk (corresponding to 64k heap) is definitely to small for 32M regions, probably even for 1M regions.
28-06-2019

Some experiments with variable size of chunks showed that this "Setup Merge Heap Roots" cost is obviously dependent on the chunk size, and by increasing it we can decrease this time at apparently no other impact at least as long the number of dirty chunks >> number of threads.
28-06-2019