JDK-8214458 : Port ShenandoahTaskTerminator to mainline and make it default
  • Type: CSR
  • Component: hotspot
  • Sub-Component: gc
  • Priority: P4
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 12
  • Submitted: 2018-11-29
  • Updated: 2020-02-28
  • Resolved: 2018-12-07
Related Reports
CSR :  
Description
Summary
-------

Add diagnostic VM flag "UseOWSTTaskTerminator" to set the task terminator protocol to be used.

Problem
-------

The current task termination protocol (called ParallelTaskTerminator, or PTT here) is inefficient in the number of waiting threads it wakes up for a given observed amount of remaining work and in some situations takes a very long time to find out that all threads are done with their work to actually terminate a given parallel phase.

The Shenandoah project has implemented an improved termination protocol called Optimized Work Stealing Threads (OWST) that fixes the shortcomings of PTT based on some work Google conducted earlier.

While we think the implementation of OWST is stable because it has not shown issues for 2+ years of use in the Shenandoah project for the Shenandoah collector, significantly wider use by adopting it for all Hotspot collectors may expose unknown issues. So we want a way to provide the user a way to fall back to the PTT protocol in case of issues.

Solution
--------

Introduce a new diagnostic VM flag called "UseOWSTTaskTerminator" that selects the termination protocol to use. The default is set to true, meaning that all GCs will use the OWST protocol by default. In case of issues it allows a user to switch back to the PTT protocol by setting this flag to false.

Specification
-------------
From gc_globals.hpp:

      diagnostic(bool, UseOWSTTaskTerminator, true,                         \
          "Use Optimized Work Stealing Threads task termination "           \
          "protocol")                                                       \

Comments
Thanks for the additional information; moving to Approved.
07-12-2018

Initial work and testing has actually been started with the opening of the repo for JDK 12, however has been abandoned due to time constraints until it had been picked up by Zhengyu again. There has been significant internal testing of the code during the current review cycle, in addition to RedHat using it for 2+ years, so I think it is okay to use it. We can always switch the default value if further testing shows issues.
06-12-2018

I agree with Thomas. The risk should be manageable. We've used this code for a long time now with no issues. Should further testing until JDK12 GA turn up problems, we can easily switch back using the proposed switch.
06-12-2018

Reference to the original work: Wessam Hassanein. 2016. Understanding and improving JVM GC work stealing at the data center scale. SIGPLAN Not. 51, 11 (June 2016), 46-54. DOI: https://doi.org/10.1145/3241624.2926706
06-12-2018

Based on suggestion from Kim and Roman, I change the flag from "Experimental" to "Diagnostic".
30-11-2018

At this stage of JDK 12, at least to me it would seem more prudent to have the current mechanism as the default and the Shenandoah one as an opt-in alternative. I'll move this CSR to Provisional, but I expect to see multiple reviewers before the request is finalized if the Shenandoah is kept as the default.
29-11-2018