JDK-8151670 : Unexpected concurrent refinement deactivation and reactivation
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 9
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2016-03-11
  • Updated: 2016-04-21
  • Resolved: 2016-03-25
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9 b115Fixed
Related Reports
Blocks :  
Relates :  
Description
In the attached log (refinement.tar.bz2) we see a number of places where there is a surprising sequence of deactivations of concurrent refinment threads even though there are many buffers available (far more than the reported deactivation threshold), followed quickly by reactivations of those same threads.

Unnecessary cycling between active and inactive is wasteful of cycles.

More importantly, the unusual deactivations seem be related to requests to pause concurrent refinement in order to initiate a collection pause, and this state cycling may be delaying that desired GC pause.

The attached script (find_unusual_deactivation_activation_seq.py) can be applied to a log made with -Xlog:gc*,gc+refine=debug to print the first of each problematic deactivations.

Comments
On the suspend side there shouldn't be much difference between using STS::yield or deactive as the response to a STS synchronize request. It shouldn't make much difference for safepoint latency (subject to JDK-8152196). The impact of changing the refinement threads to use STS::yield is on the STS::desynchronize side. For a non-GC safepoint, the threads will simply resume where they left off, with no need to go through the stepped activation protocol again to ramp up to a similar number of active refinement threads. [Of course, this only matters when there were some refinement threads active, and matters more when there were many.] Some GC safepoints will drain the dirty card queue set, leaving any suspended refinement threads with nothing to do when they are resumed, so they end up deactivating anyway. If there happen to be many refinement threads running at that point, there will be some wasted effort. However, we should be attempting to schedule the refinement threads to minimize the number active and to ensure we don't have too many buffers in the set when a GC occurs, which will mitigate the yield before / deactivate after scenario. If this situation arises often then either we don't have enough refinement threads to keep up with the mutator, or we're scheduling refinement threads poorly. For other GC safepoints, new dirty card buffers are added to the dirty card queue set, so if there are suspended refinement threads, they may well find they have plenty of work to do when resumed. But as discussed above, we don't usually expect many (or any) refinement threads to be active going into (and so coming out of) a GC. So unless I'm missing something, the main benefit of changing refinement threads to use yield would seem to be reducing the work involved in resuming refinement after a non-GC safepoint.
18-03-2016

I was wrong when I said "(2) the closure returned false because of a pending yield request" doesn't apply here. Yield requests are generated for all safepoints, not just for GCs. But the refinement threads respond to yield requests by deactivating rather than by merely yielding. (Note that the concurrent marking threads differ by actually calling SuspendibleThreadSet::yield.) That means that all of the threads go back to waiting for activation requests. To resume processing we need to have a buffer get enqueued, which notifies the primary thread, which starts the chain of activations. It would be better for the threads to remain active but actually yield, rather than deactivating.
12-03-2016

An interesting place in the log is the sequence of activations and deactivations between GC(11) and GC(12). Starting immediately after GC(11) (16.928s) we see a lot of thrashing back and forth by the refinement threads, with threasholds in the 60-160 range, and current buffer counts in the 350-450 range. Then at 16.944s there is a nice clean sequence of activating all the refinement threads. And at 16.955s they all start deactivating as the number of buffers falls below their thresholds. From there on until GC(12) at 17.942s we see nice little clusters of activation to process some incoming buffers and then deactivation as the number of buffers drops. The thrashing from 16.928s to 16.944s is unexpected. The only reason an activated refinement thread should deactivate are (1) the number of unprocessed buffers is below the thread's threshold (which clearly isn't the case here), or (2) the closure returned false because of a pending yield request, which shouldn't apply here either.
11-03-2016