JDK-8217744 : [TESTBUG] JFR TestShutdownEvent fails on some systems due to process surviving SIGINT
  • Type: Bug
  • Component: hotspot
  • Sub-Component: jfr
  • Affected Version: 13
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2019-01-24
  • Updated: 2020-04-27
  • Resolved: 2019-01-29
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 13 Other
13 b06Fixed openjdk8u262Fixed
Related Reports
Relates :  
Sub Tasks
JDK-8217748 :  
Description
This issue was reported by Goetz; it is specific to some SAP systems. I was unable to reproduce it in Oracle test system so far.

Test case (sub-test) that fails: TestSig
    Log states: "Running subtest 7 (jdk.jfr.event.runtime.TestShutdownEvent$TestSig)"

Error message:
=========================
Exception in thread "main" java.lang.RuntimeException: Process survived the SIGINT signal!
	at jdk.test.lib.Asserts.fail(Asserts.java:594)
	at jdk.jfr.event.runtime.TestShutdownEvent$TestSig.runTest(TestShutdownEvent.java:230)
	at jdk.jfr.event.runtime.TestShutdownEvent$TestMain.main(TestShutdownEvent.java:111)

Exit code: 1
Error: Value not equal to Shutdown requested from Java, field='reason', value='No remaining non-daemon Java threads': expected No remaining non-daemon Java threads to equal Shutdown requested from Java
=========================

Proposed solution:
   Instead of asserting when the process survived SIGINT (or other signal), the test case should be more flexible/robust and skip the rest of the test case, continue to the rest of the test.

Comments
Replacing jdk8u-fix-request with link to JDK-8239140
17-02-2020

RFC: https://mail.openjdk.java.net/pipermail/jdk8u-dev/2020-January/011063.html
30-01-2020

"survive SIGINT" is the wrong way to look at this as I tried to explain above. If the process is surviving SIGINT then it means it was already shutting down when the SIGINT was delivered. In that case there would seem to be an error elsewhere in the test and that error should be exposed and dealt with. If the target system is failing to deliver SIGINT then I don't think that is something the test should be trying to account for.
30-01-2019

The goal of this test is not to test the system and signals, rather to test that JFR records Shutdown Event correctly. If the underlying system is configured in such way that the process(es) survive SIGINT, the test has no control over it. However, with this change, the test can now handle such situations. It may be a work around, but I believe it is applicable in this case.
30-01-2019

The fix seems like a workaround to me. The underlying problem doesn't seem to have been determined.
30-01-2019

David, thank you for this info. The test starts child process, then starts another process "KILL -<SIGTYPE> <PID>", then waits/sleeps for a bit of time (up to 10sec) before checking the results.
25-01-2019

SIGINT will trigger an orderly shutdown of the VM. That is handled by the signal processing thread in a normal execution context. If there is already a shutdown in process then the new shutdown request will not proceed. Exactly what will happens depends on exactly where we have gotten in the shutdown protocol.
25-01-2019

Here is the initial webrev: http://cr.openjdk.java.net/~mseledtsov/8217744.00/ I have changed assert to printf; added check for exit code; if the child process survives the signal, it will exit normally (exit code 0); in such case test will skip the verification.
25-01-2019

The solution may take some time. I will create a proposed fix, and ask Goetz to verify it on the system where failure can be reproduced. Hence, for time being, I will create a trivial sub-task excluding just the failing sub-test-case.
24-01-2019