JDK-8253602 : SA core file tests still(2) time out on OSX with "java.io.IOException: App waiting timeout"
  • Type: Bug
  • Component: hotspot
  • Sub-Component: svc-agent
  • Affected Version: 16
  • Priority: P4
  • Status: Closed
  • Resolution: External
  • OS: os_x
  • CPU: x86_64
  • Submitted: 2020-09-24
  • Updated: 2020-10-14
  • Resolved: 2020-10-14
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdResolved
Related Reports
Relates :  
Relates :  
Relates :  
Description
The following test timed out in the JDK16 CI:

serviceability/sa/ClhsdbPmap.java#id1

Here's a snippet from the log file:

#section:main
----------messages:(6/225)----------
command: main ClhsdbPmap true
reason: User specified action: run main/othervm/timeout=480 ClhsdbPmap true 
Mode: othervm [/othervm specified]
Timeout information:
--- Timeout information end.
elapsed time (seconds): 7091.584
----------configuration:(0/0)----------
----------System.out:(10/1923)----------
Starting ClhsdbPmap test: withCore==true
Command line: ['/mesos/work_dir/jib-master/install/jdk-16+17-803/macosx-x64-debug.jdk/jdk-16/fastdebug/bin/java' '-XX:MaxRAMPercentage=12' '-Djava.io.tmpdir=/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S106665/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/7d436cda-b0f6-4c72-a44b-5ae78d818285/runs/92698b8b-a68e-42f9-ac26-67b087aa6f83/testoutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_serviceability/tmp' '-XX:+CreateCoredumpOnCrash' '-XX:+UseZGC' '-XX:+CreateCoredumpOnCrash' '-Djava.library.path=/mesos/work_dir/jib-master/install/jdk-16+17-803/macosx-x64-debug.test/hotspot/jtreg/native' '-cp' '/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S106665/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/7d436cda-b0f6-4c72-a44b-5ae78d818285/runs/92698b8b-a68e-42f9-ac26-67b087aa6f83/testoutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_serviceability/classes/1/serviceability/sa/ClhsdbPmap_id1.d:/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S106665/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/7d436cda-b0f6-4c72-a44b-5ae78d818285/runs/92698b8b-a68e-42f9-ac26-67b087aa6f83/testoutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_serviceability/classes/1/test/lib' 'jdk.test.lib.apps.LingeredApp' '2caa4ce4-a63a-4f17-b0bc-8d59868343c5.lck' 'forceCrash']
[2020-09-24T07:46:40.119522Z] Gathering output for process 50602
[2020-09-24T07:46:40.192439Z] Waiting for completion for process 50602
[2020-09-24T07:46:40.192615Z] Waiting for completion finished for process 50602
Output and diagnostic info for process 50602 was saved into 'pid-50602-output.log'
[2020-09-24T07:46:40.201269Z] Waiting for completion for process 50602
[2020-09-24T07:46:40.201412Z] Waiting for completion finished for process 50602
Run test with ulimit -c: unlimited
Timeout refired 4800 times
----------System.err:(20/1273)----------
java.lang.RuntimeException: Test ERROR java.io.IOException: App waiting timeout
	at ClhsdbPmap.main(ClhsdbPmap.java:88)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
	at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.io.IOException: App waiting timeout
	at jdk.test.lib.apps.LingeredApp.waitAppReady(LingeredApp.java:265)
	at jdk.test.lib.apps.LingeredApp.waitAppReady(LingeredApp.java:295)
	at jdk.test.lib.apps.LingeredApp.startAppExactJvmOpts(LingeredApp.java:417)
	at jdk.test.lib.apps.LingeredApp.startApp(LingeredApp.java:432)
	at ClhsdbPmap.main(ClhsdbPmap.java:63)
	... 6 more

JavaTest Message: Test threw exception: java.lang.RuntimeException: Test ERROR java.io.IOException: App waiting timeout
JavaTest Message: shutting down test

STATUS:Failed.`main' threw exception: java.lang.RuntimeException: Test ERROR java.io.IOException: App waiting timeout
----------rerun:(37/6805)*----------
Comments
A couple more oddities I just notice. There is no mention of the forced crash and core file in the log file. This might the result of the test timing out waiting for the core file, so by the time it is produced, the test is beyond the point were it looks for the core file and prints info about it and the crash into the jtreg log. The other oddity is that mach5 says there is a core file, but when you try to download it you are taken to a URL that says "File not found".
24-09-2020

ZGC core files can be absolutely huge so you may not want to run this test (and others) with ZGC. Please check with the ZGC folks.
24-09-2020

JDK-8253028 and JDK-8249150 are previous attempts at addressing this issue with longer timeouts. They helped, but since there seems to be no limit as to how long the core dump can take, they haven't completely solved the problem.
24-09-2020

This is a ZGC test run, which is curious because most SA tests are disabled with ZGC because it produces on error on startup. See JDK-8220624. However, Pmap doesn't rely on any GC support, so it is allowed to run. The logfile says: Error Program '.../bin/java' timed out (timeout set to 4800000ms, elapsed time including timeout handling was 7091579ms). Note both the test timeout and the timeout used in waitAppReady() are 4800000ms (48m). It looks like we hit the waitAppReady() timeout first, but even after it failed the test didn't timeout until 7091579ms. I think this is because even after failing with the exception in waitAppReady(), the test can't exit until the subprocess has finished with the core dump, which appears to be taking upwards of 70 minutes. However, overall the test run took 1h58m, which I don't have an explanation for, except 48m + 70m is 1h58m, but these two timeouts should not be additive. Perhaps I misunderstanding the message, and 7091579ms is how much time it took after the 4800000ms timeout, which means the amount of time it spent trying to shutdown the process that is blocked on the core dump. In any case, this is again an issue with core dumps on OSX sometimes taking a very long time. We got rid of almost all these types of timeouts by allowing 32 minutes for the core dump. It's hard to say how far we should take this to make sure they never appear. I'm inclined to just ignore for now and see how problematic this issue remains. I don't want to give all these core dump tests 2hr+ timeouts.
24-09-2020