Bug ID: JDK-8043575 Dynamically parallelize reference processing work

JDK-8043575 : Dynamically parallelize reference processing work

Type: Enhancement
Component: hotspot
Sub-Component: gc
Affected Version: 8u20,9

Priority: P4
Status: Resolved
Resolution: Fixed

Submitted: 2014-05-20
Updated: 2025-06-18
Resolved: 2018-06-18

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 11
11 b19Fixed

Related Reports

Blocks :	JDK-8204617 - ParallelGC parallel reference processing does not set MT degree in reference processor
Blocks :	JDK-8205043 - Make parallel reference processing default for G1
Duplicate :	JDK-8181214 - ReferenceProcessor should shortcut execution on empty lists
Duplicate :	JDK-7068229 - G1: Dynamically enable MT reference processing for remark pauses
Relates :	JDK-8359925 - Deprecate and obsolete ParallelRefProcEnabled
Relates :	JDK-8076462 - Preserving the referents of concurrent mark work distribution method causes long pause times
Relates :	JDK-7068229 - G1: Dynamically enable MT reference processing for remark pauses
Relates :	JDK-8173211 - G1: Enqueuing dirty cards during reference enqueuing in phase 3 does not scale
Relates :	JDK-8203951 - ReferenceProcessor::_has_adjustable_queue should be removed when CMS is obsoleted
Relates :	JDK-8172792 - JEP 308: Improve Dynamic Number of Thread Sizing for G1
Relates :	JDK-8181214 - ReferenceProcessor should shortcut execution on empty lists
Relates :	JDK-8078117 - Refactor enqueue_discovered_references() and redirty_logged_cards() to share the same work gang.
Relates :	JDK-8173335 - Improve logging for j.l.ref.reference processing
Relates :	JDK-8072498 - Multi-thread JNI weak reference processing
Relates :	JDK-8051680 - (ref) unnecessary process_soft_ref_reconsider
Relates :	JDK-6938732 - Ergonomify (parallel) reference processing
Relates :	JDK-8204686 - Dynamic parallel reference processing support for Parallel GC

Description

The reference processing phases can, depending on the number of soft references and the amount of retained objects take a significant amount of time.

To decrease the duration of the phase it is possible to use the ParallelRefProcEnabled switch - however that needs to be done manually.

The goal of this CR is to dynamically turn on parallel reference processing for the different gc phases.

One way to do this is to estimate the amount of work as fraction of the total gc pause time, and if this crosses a threshold, enable parallel reference processing automatically.

Comments

URL: http://hg.openjdk.java.net/jdk/jdk/rev/8f1d5d706bdd User: tschatzl Date: 2018-06-18 10:12:25 +0000
18-06-2018
The closed as duplicate JDK-7068229 contains a few more thoughts.
13-06-2018
Excellent! We have worked this around for zero-task case in Shenandoah: http://mail.openjdk.java.net/pipermail/shenandoah-dev/2017-June/002611.html. But more comprehensive fix for small number of references, like Jon suggested above, will benefit this even more. Our benchmarks do care about pause times on this scale, so I would happily test any suggested patch.
06-06-2017
Any progress with it? We hit it already, but the other way around: with parallel ref processing turned on by default, we sometimes lose valuable pause time when there is no work -- JDK-8181214 -- which seems to be the special case of what is proposed here.
06-06-2017
I have a prototype but I want to refine it. And yes, if there's not enough references to process (not only zero case), single thread shows better performance.
06-06-2017
I meant that I prefer not to address Kim's concern in this CR. But I didn't mention how to follow up his concern too. I filed JDK-8173211 for it.
23-01-2017
Considering that ParallelRefProcEnabled also parallelizes reference enqueuing, although it has "RefProc" in it, and other arguments like that reference enqueuing is an integral part of reference processing, I would consider automatically enabling parallelization of enqueuing as part of more general "automatically enabling parallelizing reference processing work" work. Improving the parallelization of reference enqueuing may be best done as part of a different CR though, as there are several options for it - instead of straightforward improvement of the existing enqueuing parallelization code (e.g. removing that DCQS enqueuing bottleneck, which may yield most benefits already), it may be useful to move that work into the reference processing phases. Btw, there is a similiar issue in the evacuation code already, see JDK-8162929.
23-01-2017
My understanding of this CR is that parallelizing the work for discovered references only (ReferenceProcessor::process_discovered_references). BTW, good point for enqueue_discovered_references.
20-01-2017
For G1 there is a problem with parallelizing enqueue_discovered_references. That enqueue applies the barrier set's write_ref_field to the discovered links being added to the pending list. If the reference is young, this isn't a problem because the write barrier doesn't need to do anything in that case. But if the reference is old, it's going to record a dirty card, and that's going to be in the shared global DirtyCardQueue, since the call is from a non-Java thread. So a locking DCQ.enqueue per reference, and severe lock contention when parallelized. One possiblity might be a special purpose variant of write_ref_field for use here, with most collectors just forwarding to their normal write_ref_field implementation, but G1 using per-worker DCQs.
19-01-2017
The lengths of the discovered lists is available and could be used to estimate the amount of work. The divisor of 1000 was picked out of the air. diff --git a/src/share/vm/gc/shared/referenceProcessor.cpp b/src/share/vm/gc/shared/referenceProcessor.cpp --- a/src/share/vm/gc/shared/referenceProcessor.cpp +++ b/src/share/vm/gc/shared/referenceProcessor.cpp @@ -845,13 +845,16 @@ // of the test. bool must_balance = _discovery_is_mt; + size_t total_list_count = total_count(refs_lists); + + uint number_of_workers = num_q(); + set_active_mt_degree(total_list_count / 1000 + 1); + if ((mt_processing && ParallelRefProcBalancingEnabled) \|\| must_balance) { balance_queues(refs_lists); } - size_t total_list_count = total_count(refs_lists); - if (PrintReferenceGC && PrintGCDetails) { gclog_or_tty->print(", " SIZE_FORMAT " refs", total_list_count); } @@ -899,6 +902,7 @@ } } + set_active_mt_degree(number_of_workers); return total_list_count; }
11-06-2015