JDK-8191093 : Improve behavior when safepoint begin times out
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 10
  • Priority: P4
  • Status: Resolved
  • Resolution: Duplicate
  • Submitted: 2017-11-10
  • Updated: 2019-08-15
  • Resolved: 2019-03-28
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 13
13Resolved
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Description
We recently had an instance of the PageArmed == 0 guarantee failing (see JDK-8155700 and JDK-8038480 for other cases of this). In our case we are pretty sure that the underlying cause was a flaky host that caused a thread (or threads) to somehow get stuck and never ack the safepoint. However, it would have been nice if the JVM handled the situation a bit better.

What I'd like to improve is:

First, eliminate the unintentional attempt inside the loop to arm the polling page when the {{iterations}} variable overflows.

Second, introduce a time-based heuristic to force the JVM to abort when it's stuck in the loop for way too long. I think introducing a new cmd line arg to specify how long to wait before aborting (and setting it to something conservative, like 30 mins or an hour) is probably the best way. We could re-use the SafepointTimeout / DieOnSafepointTimeout args for this. However, I think it's nice to get a warning early (the current 10sec default for SafepointTimeoutDelay is reasonable IMHO) and aborting much later.

Thoughts?
Comments
Global is going away in 14. Local poll will be looked into in 8198730.
28-03-2019