JDK-8213108 : Improve work distribution during remembered set scan
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 12,13
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2018-10-29
  • Updated: 2024-03-19
  • Resolved: 2019-06-27
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 14
14 b04Fixed
Related Reports
Blocks :  
Blocks :  
Blocks :  
Blocks :  
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
Currently, during remembered set scan every thread may iterate over all cards in the remembered set of a given region; distribution across threads occurs by threads claiming cards N to N+x, and only trying to iterate over these cards.

To avoid scanning the same card multiple times, G1 marks the already scanned cards in card table.

One alternative is to for all regions collect the PRTs, and every thread claim part of that region's entire range of cards, building a small local bitmap of cards to scan using the PRTs for that region.

This ensures that every card is only scanned once during Scan RS, avoids the need for all threads to iterate over all cards of a given region at the cost of small pre-initialization costs.
Comments
URL: http://hg.openjdk.java.net/jdk/jdk/rev/3e31a8beaae4 User: tschatzl Date: 2019-06-27 09:48:51 +0000
27-06-2019

There is a new log message available with gc+remset=debug, looking as follows: [5.440s][debug][gc,remset ] GC(1) Visited cards 18304 Total dirty 393216 (4.65%) Total old 6356992 (0.29%) This gives a measure of the effectiveness of the optimizations to avoid scanning areas not containing dirty cards. Visited cards: number of cards visited searching for dirty cards Total dirty: total cards of regions that contained a dirty card Total old: total cards in old regions that without the optimization would need to be scanned without the optimization (i.e. what other collectors would potentially scan) The number in parenthesis contains the percentage relative to the visited cards. The number of actually scanned cards (walked object-by-object) can be obtained by the "Scanned Cards" entry in the "Scan Heap Roots" phase. G1 prints this information for every evacuation increment.
24-05-2019

This change significantly modifies the log output for garbage collections: [info ][gc,phases ] GC(0) Pre Evacuate Collection Set: 0.0ms [debug][gc,phases ] GC(0) Prepare TLABs: 0.0ms [debug][gc,phases ] GC(0) Choose Collection Set: 0.0ms [debug][gc,phases ] GC(0) Region Register: 0.0ms [debug][gc,phases ] GC(0) Prepare Heap Roots: 0.0ms (1) [info ][gc,phases ] GC(0) Merge Heap Roots: 0.2ms (2) [debug][gc,phases ] GC(0) Remembered Sets (ms): [...] (3) [debug][gc,phases ] GC(0) Merged Sparse: [...] (4) [debug][gc,phases ] GC(0) Merged Fine: [...] (5) [debug][gc,phases ] GC(0) Merged Coarse: [...] (6) [debug][gc,phases ] GC(0) Hot Card Cache (ms): [...] (7) [debug][gc,phases ] GC(0) Log Buffers (ms): [...] (8) [debug][gc,phases ] GC(0) Processed Buffers: [...] (9) [debug][gc,phases ] GC(0) Dirty Cards: [...] (10) [debug][gc,phases ] GC(0) Skipped Cards: [...] (11) [info ][gc,phases ] GC(0) Evacuate Collection Set: 104.2ms [debug][gc,phases ] GC(0) Ext Root Scanning (ms): [...] [debug][gc,phases ] GC(0) Scan Heap Roots (ms): [...] (12) [debug][gc,phases ] GC(0) Scanned Cards: [...] (13) [debug][gc,phases ] GC(0) Scanned Blocks: [...] (14) [debug][gc,phases ] GC(0) Claimed Chunks: [...] (15) The changed or added phases were marked with a number in parenthesis. Here's a short explanation for these: (1) Prepare Heap Roots (new): Prepare some internal data structures for heap root scanning (2) Merge Heap Roots (new): This new, separate parallel phase merges the roots from the heap (remembered sets, log buffers, HCC) for this garbage collection. (3) Merge Remembered Sets (new): subphase of (2), merging remembered sets (4,5,6) Merge Sparse/Fine/Coarse (new): the amount of sparse/fine/coarse PRTs merged (7) Hot Card Cache (new): subphase of (2), merging of HCC entries (8) Log Buffers (new): subphase of (3), merging log buffer entries from the DCQS (9,10,11) Amount of log buffers, log buffer entries that need to be scanned and ones to be skipped respectively (12) Scan heap roots collected earlier (13) Number of cards scanned (14) Number of blocks (contiguous ranges of cards) scanned (15) Number of areas distributed/scanned An Update RS phase does not exist any more - it has been redistributed into (8) and (12).
24-05-2019

The suggested final implementation is to use the card table as a place to merge the remembered set. I.e. we add a phase per evacuation increment that merges the cards to be scanned from all sources (remembered sets, log buffers, HCC) into the card table by dirtying not previously scanned cards there. In the actual evacuation phase, scan this combined set of dirty cards, splitting the work across threads in chunks (large contiguous areas of the card table). As an additional optimization, during the merge phase remember which regions and which chunks actually contain cards to be scanned, to quickly skip over them during scan. Threads then iterate across these dirty regions, and compete for chunks containing some dirty cards only. Further, try to find contiguous ranges of dirty cards ("blocks") to scan them in one scan process like other throughput collectors do. After all evacuations, clean and redirty the card table as before.
22-05-2019