JDK-8202845 : Refactor reference processing for improved parallelism
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 11
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2018-05-09
  • Updated: 2018-11-20
  • Resolved: 2018-06-18
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11
11 b19Fixed
Related Reports
Blocks :  
Relates :  
Relates :  
Description
From [~kbarrett]:
"The existing 3-phase processing is nice and simple and general, but it is really not helpful when one starts dealing with parallelism, scaling issues, and task startup/termination costs.  Various changes in reference processing since that 3-phase approach was instituted allow or make more beneficial various changes.

A better set of phases would be something like

(1) Current phase1 for SoftReferences (maybe; JDK-8051680).

(2) Walk discovered Soft/Weak/FinalReferences (together) and, if NULL or live referent then drop (applying keep_alive to the referent, but complete_gc is not needed), else if not FinalReference then clear and enqueue.

(3) Walk discovered FinalReferences residue from (2), applying keep_alive and complete_gc, and enqueue.

(4) Walk discovered PhantomReferences and (a) drop those with NULL or live referents (applying keep_alive to the referent, but complete_gc is not needed), and (b) clear and enqueue those with dead referents."

This reduces the number of work gang uses from 9 to 4 and allows better usage of parallelism due to larger per-phase workloads
Comments
One side effect of this change are changes in the log output: as we do not have separate timings for every phase of every Reference type any more, the idea was to turn the logging inside-out: Instead of this structure (at trace level): Reference Processing: 0.0ms SoftReference: 0.5ms Balance queues: 0.0ms Phase1: 0.2ms Process lists (ms) Min: 0.0, Avg: ... Workers: 28 Phase2: 0.1ms Process lists (ms) Min: 0.0, Avg: ... Workers: 28 Phase3: 0.2ms Process lists (ms) Min: 0.0, Avg: ... Workers: 28 Discovered: 0 Cleared: 0 WeakReference: 0.3ms Balance queues: 0.0ms Phase2: 0.1ms Process lists (ms) Min: 0.0, Avg: ... Workers: 28 Phase3: 0.2ms Process lists (ms) Min: 0.0, Avg: ... Workers: 28 Discovered: 40 Cleared: 28 FinalReference: 0.3ms Balance queues: 0.0ms Phase2: 0.1ms Process lists (ms) Min: 0.0, Avg: ... Workers: 28 Phase3: 0.2ms Process lists (ms) Min: 0.0, Avg: ... Workers: 28 Discovered: 0 Cleared: 0 PhantomReference: 0.3ms Balance queues: 0.0ms Phase2: 0.1ms Process lists (ms) Min: 0.0, Avg: ... Workers: 28 Phase3: 0.2ms Process lists (ms) Min: 0.0, Avg: ... Workers: 28 Discovered: 6 Cleared: 5 it will look like this: Reference Processing: 0.1ms Phase1: 0.0ms Balance queues: 0.0ms SoftRef (ms): Min: 0.0, Avg: ... Workers: 3 0.0 0.0 0.0 - ... - - - - - - (T) Phase2: 0.0ms Balance queues: 0.0ms SoftRef (ms): Min: 0.0, Avg: ... Workers: 3 0.0 0.0 0.0 - ... - - - - - - (T) WeakRef (ms): Min: 0.0, Avg: ... Workers: 3 0.0 0.0 0.0 - ... - - - - - - (T) FinalRef (ms): Min: 0.0, Avg: ... Workers: 3 0.0 0.0 0.0 - ... - - - - - - (T) Phase3: 0.0ms Balance queues: 0.0ms FinalRef (ms): Min: 0.0, Avg: ... Workers: 3 0.0 0.0 0.0 - ... - - - - - - (T) Phase4: 0.0ms Balance queues: 0.0ms PhantomRef (ms): Min: 0.0, Avg: ... Workers: 3 0.0 0.0 0.0 - ... - - - - - - (T) SoftReference: Discovered: 0 Cleared: 0 WeakReference: Discovered: 30 Cleared: 30 FinalReference: Discovered: 0 Cleared: 0 PhantomReference: Discovered: 5 Cleared: 5 I.e. instead of showing for every reference type their phases with timing, now the four phases with the reference timings that are processed within that reference are shown. Also (that has been a bug), the lines with a "(T)" are added at trace level. At debug level the timing output has been updated to be similar to other gc+phases output, i.e. is now Reference Processing: 0.2ms Phase1: 0.0ms Balance queues: 0.0ms SoftRef (ms): Min: 0.0, Avg: ... Workers: 3 Phase2: 0.1ms Balance queues: 0.0ms SoftRef (ms): Min: 0.0, Avg: ... Workers: 3 WeakRef (ms): Min: 0.0, Avg: ... Workers: 3 FinalRef (ms): Min: 0.0, Avg: ... Workers: 3 Phase3: 0.0ms Balance queues: 0.0ms FinalRef (ms): Min: 0.0, Avg: ... Workers: 3 Phase4: 0.0ms Balance queues: 0.0ms PhantomRef (ms): Min: 0.0, Avg: ... Workers: 3 SoftReference: Discovered: 0 Cleared: 0 WeakReference: Discovered: 38 Cleared: 30 FinalReference: Discovered: 0 Cleared: 0 PhantomReference: Discovered: 7 Cleared: 6 instead of the old output as follows: Reference Processing: 0.0ms SoftReference: 0.5ms Balance queues: 0.0ms Phase1: 0.2ms Phase2: 0.1ms Phase3: 0.2ms Discovered: 0 Cleared: 0 WeakReference: 0.3ms Balance queues: 0.0ms Phase2: 0.1ms Phase3: 0.2ms Discovered: 0 Cleared: 0 FinalReference: 0.3ms Balance queues: 0.0ms Phase2: 0.1ms Phase3: 0.2ms Discovered: 0 Cleared: 0 PhantomReference: 0.3ms Balance queues: 0.0ms Phase2: 0.1ms Phase3: 0.2ms Discovered: 5 Cleared: 5 which did not show any per-thread information. In case of using serial ref processing, this is the proposed output: Reference Processing: 0.1ms Phase1: 0.0ms SoftRef: 0.0ms Phase2: 0.0ms SoftRef: 0.0ms WeakRef: 0.0ms FinalRef: 0.0ms Phase3: 0.0ms FinalRef: 0.0ms Phase4: 0.0ms PhantomRef: 0.0ms SoftReference: Discovered: 0 Cleared: 0 WeakReference: Discovered: 30 Cleared: 30 FinalReference: Discovered: 0 Cleared: 0 PhantomReference: Discovered: 5 Cleared: 5 Note that in the previous version, the output of parallel/debug was the same as serial/debug. This was another bug. Discovered/Cleared counts per reference type are shown afterwards instead of within in all cases. We may improve and probably streamline the output of reference counts, but that is a different issue.
18-06-2018

It turns out that complete_gc is needed by all phases. Quoting from the review thread: "In these phases, for some collections the keep_alive is just an expensive nop, finding that the referent is already marked live (which we knew because the is_alive closure already told us that). For others, it needs to forward the referent field to the already copied referent; no additional work (e.g. scanning the referent) is needed. The only current keep_alive closure that creates any work for the complete_gc closure in these cases is G1CopyingKeepAliveClosure. The old process_phase2 assumed the complete_gc closure would never have any work to do, so didn't bother calling it." That assumption by process_phase2 was incorrect because of G1CopyingKeepAliveClosure. However, the consensus from the review was that the assumption by process_phase2 was a bad idea, and there were better ways to avoid unnecessary or expensive work there, such as JDK-8204947.
15-06-2018

JDK-8203028 blocks JDK-8202845 because the claim in the latter that complete_gc isn't needed isn't true until the simplification around pp2_work_concurrent_discovery is done.
13-05-2018