JDK-8198730 : Improve detection of failure to reach a safepoint
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 11
  • Priority: P4
  • Status: Closed
  • Resolution: Won't Fix
  • Submitted: 2018-02-27
  • Updated: 2024-01-10
  • Resolved: 2024-01-10
Related Reports
Blocks :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8242071 :  
Description
In doing the cleanup associated with JDK-8194085 we left in place a check that detects failure to reach a safepoint.

Old code:

if (SafepointMechanism::uses_global_page_poll() && int(iterations) == DeferPollingPageLoopCount) {
  guarantee (PageArmed == 0, "invariant") ;
  PageArmed = 1 ;
  os::make_polling_page_unreadable();
}

New code:

if (int(iterations) == -1) { // overflow - something is wrong.
  // We can only overflow here when we are using global
  // polling pages. We keep this guarantee in its original
  // form so that searches of the bug database for this
  // failure mode find the right bugs.
  guarantee (PageArmed == 0, "invariant");
}

To maintain the existing behaviour the above should have kept the full condition:

if (SafepointMechanism::uses_global_page_poll() && int(iterations) == -1)

But Dan then raised the question as to what happens when we are not using the global polling page:

1) is it possible to overflow iterations when not using global polling pages?
2) should we fail a guarantee if that happens? 

We do need to check how handshake based safepoint polling responds in cases where we would fail the guarantee and if we can get stuck and need to detect the problem as we do for the global poll page.

Comments
Runtime Triage: This is not on our current list of priorities. We will consider this feature if we receive additional customer requirements.
10-01-2024

We should look into these questions: 1) is it possible to overflow iterations when not using global polling pages? 2) should we fail a guarantee if that happens? IMHO not for 13. (global poll is obsoleted in 14, so when we remove code here this can be better looked into)
28-03-2019

Trying to find a reliable test case that can be used to trigger the crash so we can test handshake versus global behaviour.
27-02-2018