JDK-8035557 : Pause time predictions off for mixed collections due to wrong rs_length estimation
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 8u20,9
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2014-02-21
  • Updated: 2019-07-25
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Relates :  
Description
In many cases the pause time predictions for mixed collections are off by a lot, causing missed pause time goals.

Further, young gen size for the next pause is very small for a few collections.

This can be seen e.g. in specjbb2013, but also in loads like CRM Fuse.

Problem seems to be wrong prediction (and slow adaption to changed realities).

A particular problematic point is prediction of RS length, which has a big impact on pause time.

The attached predication.png figure shows the problem: at first, during young-only gcs actual rs length and prediction match well; mixed gcs however cause a large change in the actual RS length (depending on the application it is a real step-function like in the figure) at t_1. The prediction does not follow immediately (actually very slowly), so the eden size is not adjusted as it should.

During mixed gcs, rs length prediction slowly starts to match the actual value, but often actually never reaching the actual value (at e.g. t_2 mixed gcs are done), i.e. never matching the pause time goal.

After mixed gcs, actual rs length goes down again, but the prediction takes some time to adapt to the change. So the young gen size is kept small for some time after the mixed gcs too (loosing more throughput).

As in these cases rs_length prediction seems to be the main problem - try adding separate prediction for young and mixed gcs.
Comments
The (at least) other metric that needs to be separated is the _pending_cards at gc start, and the cost metric for pending cards. The time prediction for update rs is completely off (like factor 10^3) due to typically high number of that swapping over into the first young gen after the mixed gc phase. Potentially the cost per card is also larger during that time.
14-12-2017

Attached is a hack-fix that so far seems to work fine: it simply suppresses adding rs length samples during mixed gc. This works because we do not use the rs length prediction for determining e.g. young gen size during mixed gc anyway.
13-12-2017

Changed this to a bug because in conjunction with adaptive IHOP this causes additional performance regressions due to many unnecessary marking rounds. I.e. the small young gen after the mixed phase causes higher promotion than usual, which decreases the IHOP threshold, which prematurely starts marking and another mixed gc cycle.
11-12-2017