Use the same workgang for enqueue_discovered_references() and redirty_logged_cards() to share the parallel overhead for those two tasks.
Close for now as the experiments show that there's no benefit.
Comparison of benchmark score and GC pause time.
specjbb2005: 10 iterations
specjvm2008: 5 iterations.
enqueue_discovered_references() is conditionally run in parallel while redirty_logged_cards() is always run in parallel.
(enqueue_discovered_references() is controlled by ParallelRefProcEnabled to run in parallel)
I made a prototype and did some experiments.
The prototype is simple. Extend the the first task(enqueue_discovered_references related one) and add WorkGangBarrierSync as a member.
And between 2 tasks call WorkGangBarrierSync.enter() to wait until first work is done.
Expected to reduce startup/termination synchronization of the gang threads, but it was not good.
I compared benchmark score and GC pause time for these tests.
Specjvm2008: 5 iterations
Specjbb2005: 10 iterations
Attached image is the result of the comparison of benchmark score and GC pause time.
This prototype made GC pause time worse.