JDK-8265037 : serviceability/sa/ClhsdbPmap.java#id1 failed with "RuntimeException: Process is still alive. Can't get its output."
  • Type: Bug
  • Component: hotspot
  • Sub-Component: svc-agent
  • Affected Version: 11,17,18,19,20
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • OS: os_x_10.15,windows
  • CPU: x86_64
  • Submitted: 2021-04-11
  • Updated: 2024-02-02
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8294993 :  
Description
The following test failed in the JDK17 CI:

serviceability/sa/ClhsdbPmap.java#id1

Here's a snippet from the log file:

#section:main
----------messages:(6/225)----------
command: main ClhsdbPmap true
reason: User specified action: run main/othervm/timeout=480 ClhsdbPmap true 
Mode: othervm [/othervm specified]
Timeout information:
--- Timeout information end.
elapsed time (seconds): 4843.346
----------configuration:(0/0)----------
----------System.out:(11/1967)----------
Starting ClhsdbPmap test: withCore==true
Command line: ['/opt/mesos/jib-master/install/jdk-17+18-1443/macosx-x64-debug.jdk/jdk-17/fastdebug/bin/java' '-XX:MaxRAMPercentage=12' '-Djava.io.tmpdir=/opt/mesos/slaves/3c846bae-ce30-4a97-93ee-9fef4497ccb6-S42436/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/07129351-9741-40d6-824a-09d9a09939b1/runs/182e553f-e6fc-4f2a-a6ea-d59b4cf1828e/testoutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_serviceability/tmp' '-XX:+CreateCoredumpOnCrash' '-XX:+UseZGC' '-XX:+CreateCoredumpOnCrash' '-Djava.library.path=/opt/mesos/jib-master/install/jdk-17+18-1443/macosx-x64-debug.test/hotspot/jtreg/native' '-cp' '/opt/mesos/slaves/3c846bae-ce30-4a97-93ee-9fef4497ccb6-S42436/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/07129351-9741-40d6-824a-09d9a09939b1/runs/182e553f-e6fc-4f2a-a6ea-d59b4cf1828e/testoutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_serviceability/classes/1/serviceability/sa/ClhsdbPmap_id1.d:/opt/mesos/slaves/3c846bae-ce30-4a97-93ee-9fef4497ccb6-S42436/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/07129351-9741-40d6-824a-09d9a09939b1/runs/182e553f-e6fc-4f2a-a6ea-d59b4cf1828e/testoutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_serviceability/classes/1/test/lib' 'jdk.test.lib.apps.LingeredApp' '91687f3a-32a0-4e78-9845-dfedec8f136e.lck' 'forceCrash']
[2021-04-11T00:48:38.281805Z] Gathering output for process 56976
[2021-04-11T00:48:38.433289Z] Waiting for completion for process 56976
[2021-04-11T00:48:38.433551Z] Waiting for completion finished for process 56976
Output and diagnostic info for process 56976 was saved into 'pid-56976-output.log'
[2021-04-11T00:48:38.454613Z] Waiting for completion for process 56976
[2021-04-11T00:48:38.454802Z] Waiting for completion finished for process 56976
Run test with ulimit -c: unlimited
Timeout refired 4800 times
LingeredApp failed to start: java.io.IOException: App waiting timeout
----------System.err:(20/1399)----------
java.lang.RuntimeException: Test ERROR java.lang.RuntimeException: Process is still alive. Can't get its output.
	at ClhsdbPmap.main(ClhsdbPmap.java:88)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
	at java.base/java.lang.Thread.run(Thread.java:831)
Caused by: java.lang.RuntimeException: Process is still alive. Can't get its output.
	at jdk.test.lib.apps.LingeredApp.getOutput(LingeredApp.java:164)
	at jdk.test.lib.apps.LingeredApp.finishApp(LingeredApp.java:376)
	at jdk.test.lib.apps.LingeredApp.startAppExactJvmOpts(LingeredApp.java:426)
	at jdk.test.lib.apps.LingeredApp.startApp(LingeredApp.java:440)
	at ClhsdbPmap.main(ClhsdbPmap.java:63)
	... 6 more

JavaTest Message: Test threw exception: java.lang.RuntimeException: Test ERROR java.lang.RuntimeException: Process is still alive. Can't get its output.
JavaTest Message: shutting down test

STATUS:Failed.`main' threw exception: java.lang.RuntimeException: Test ERROR java.lang.RuntimeException: Process is still alive. Can't get its output.
----------rerun:(37/6648)*----------

This is an odd failure. I did find the following old bug that also
mentioned "Process is still alive. Can't get its output.", but that
bug also has sightings of other failure modes so I didn't want
to reopen it:

    JDK-8197591 Tests failing with App waiting timeout
Comments
The CR was originally just for macosx-x64 and had a large number of entries. We then problem listed all core file testing on macosx-x64 so we stopped seeing the issue. But then it started popping on on Windows, so now there are a large number of Windows entries, but these failures are unrelated to the macosx-x64 failures. I think any future Windows failures should be put in a separate CR and we keep this one just for macox-x64.
17-08-2023

Was about to note that this looks like the intended crash has happened (an hs_err file is produced) but no minidump is yet produced. Then I note that has been suggested above already. The app still appears to be alive, which causes an exception which probably makes it more confusing than it need be. I'd like to make a few changes in LingeredApp to help look at this kind of problem.
07-10-2022

Here's a log file snippet from he jdk-20+4-131-tier7 sighting: serviceability/sa/ClhsdbPmap.java#core section:main ----------messages:(6/231)---------- command: main ClhsdbPmap true reason: User specified action: run main/othervm/timeout=480 ClhsdbPmap true Mode: othervm [/othervm specified] Timeout information: --- Timeout information end. elapsed time (seconds): 1966.297 ----------configuration:(0/0)---------- ----------System.out:(11/1647)*---------- Starting ClhsdbPmap test: withCore==true Command line: ['c:\\ade\\mesos\\work_dir\\jib-master\\install\\jdk-20+4-131\\windows-x64-debug.jdk\\jdk-20\\fastdebug\\bin\\java.exe' '-XX:MaxRAMPercentage=4.16667' '-Djava.io.tmpdir=c:\\sb\\prod\\1656182988\\testoutput\\test-support\\jtreg_open_test_hotspot_jtreg_hotspot_serviceability\\tmp' '-Dcom.oracle.usagetracker.config.file=c:\\sb\\prod\\1656182988\\testoutput\\test-support\\jtreg_open_test_hotspot_jtreg_hotspot_serviceability\\usage-tracker.cfg' '-XX:+CreateCoredumpOnCrash' '-Djava.library.path=c:\\ade\\mesos\\work_dir\\jib-master\\install\\jdk-20+4-131\\windows-x64-debug.test\\hotspot\\jtreg\\native' '-cp' 'C:\\sb\\prod\\1656182988\\testoutput\\test-support\\jtreg_open_test_hotspot_jtreg_hotspot_serviceability\\classes\\1\\serviceability\\sa\\ClhsdbPmap_core.d;C:\\sb\\prod\\1656182988\\testoutput\\test-support\\jtreg_open_test_hotspot_jtreg_hotspot_serviceability\\classes\\1\\test\\lib' 'jdk.test.lib.apps.LingeredApp' '752a64da-0a74-455a-9daa-c96f56fb4a34.lck' 'forceCrash'] [2022-06-25T18:58:33.159542900Z] Gathering output for process 80408 [2022-06-25T18:58:33.300203700Z] Waiting for completion for process 80408 [2022-06-25T18:58:33.300203700Z] Waiting for completion finished for process 80408 Output and diagnostic info for process 80408 was saved into 'pid-80408-output.log' [2022-06-25T18:58:33.315833800Z] Waiting for completion for process 80408 [2022-06-25T18:58:33.331441700Z] Waiting for completion finished for process 80408 Run test with ulimit -c: unlimited Timeout refired 1920 times LingeredApp failed to start: java.io.IOException: App waiting timeout ----------System.err:(18/1231)---------- java.lang.RuntimeException: Test ERROR java.lang.RuntimeException: Process is still alive. Can't get its output. at ClhsdbPmap.main(ClhsdbPmap.java:88) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:578) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) at java.base/java.lang.Thread.run(Thread.java:1589) Caused by: java.lang.RuntimeException: Process is still alive. Can't get its output. at jdk.test.lib.apps.LingeredApp.getOutput(LingeredApp.java:171) at jdk.test.lib.apps.LingeredApp.finishApp(LingeredApp.java:388) at jdk.test.lib.apps.LingeredApp.startAppExactJvmOpts(LingeredApp.java:450) at jdk.test.lib.apps.LingeredApp.startApp(LingeredApp.java:464) at ClhsdbPmap.main(ClhsdbPmap.java:63) ... 4 more JavaTest Message: Test threw exception: java.lang.RuntimeException: Test ERROR java.lang.RuntimeException: Process is still alive. Can't get its output. JavaTest Message: shutting down test STATUS:Failed.`main' threw exception: java.lang.RuntimeException: Test ERROR java.lang.RuntimeException: Process is still alive. Can't get its output. ----------rerun:(39/6115)*----------
26-06-2022

I think this Windows sighting is similar to macosx in that it doesn't finish producing the core file in a timely manner. I do see the expected hs_err file showing the expected crash that is suppose to produce the core file (which is consistent with the macOS failures). However I don't see the core file in the artifacts, which is something we do see on macosx (even though on macOS is probably is not a complete core file). This might just be a behavior difference between macOS and windows when producing a core file times out.
25-03-2022

I _think_ this is the first non-macosx sighting for this test failure mode... Here's a log file snippet for the jdk-19+15-952-tier7 sighting: serviceability/sa/ClhsdbFindPC.java#no-xcomp-core #section:main ----------messages:(6/247)---------- command: main ClhsdbFindPC false true reason: User specified action: run main/othervm/timeout=480 ClhsdbFindPC false true Mode: othervm [/othervm specified] Timeout information: --- Timeout information end. elapsed time (seconds): 1942.643 ----------configuration:(0/0)---------- ----------System.out:(11/1570)*---------- Starting the ClhsdbFindPC test Command line: ['c:\\ade\\mesos\\work_dir\\jib-master\\install\\jdk-19+15-952\\windows-x64-debug.jdk\\jdk-19\\fastdebug\\bin\\java.exe' '-XX:MaxRAMPercentage=4.16667' '-Djava.io.tmpdir=t:\\testoutput\\test-support\\jtreg_open_test_hotspot_jtreg_hotspot_serviceability\\tmp' '-Dcom.oracle.usagetracker.config.file=t:\\testoutput\\test-support\\jtreg_open_test_hotspot_jtreg_hotspot_serviceability\\usage-tracker.cfg' '-Xint' '-XX:+CreateCoredumpOnCrash' '-Djava.library.path=c:\\ade\\mesos\\work_dir\\jib-master\\install\\jdk-19+15-952\\windows-x64-debug.test\\hotspot\\jtreg\\native' '-cp' 'T:\\testoutput\\test-support\\jtreg_open_test_hotspot_jtreg_hotspot_serviceability\\classes\\3\\serviceability\\sa\\ClhsdbFindPC_no-xcomp-core.d;T:\\testoutput\\test-support\\jtreg_open_test_hotspot_jtreg_hotspot_serviceability\\classes\\3\\test\\lib' 'jdk.test.lib.apps.LingeredApp' 'd32de847-aebe-4c42-9666-81de5ce7db24.lck' 'forceCrash'] [2022-03-25T15:01:29.614763100Z] Gathering output for process 28400 [2022-03-25T15:01:29.746079800Z] Waiting for completion for process 28400 [2022-03-25T15:01:29.746079800Z] Waiting for completion finished for process 28400 Output and diagnostic info for process 28400 was saved into 'pid-28400-output.log' [2022-03-25T15:01:29.749763200Z] Waiting for completion for process 28400 [2022-03-25T15:01:29.749763200Z] Waiting for completion finished for process 28400 Run test with ulimit -c: unlimited Timeout refired 1920 times LingeredApp failed to start: java.io.IOException: App waiting timeout ----------System.err:(19/1297)---------- java.lang.RuntimeException: Test ERROR java.lang.RuntimeException: Process is still alive. Can't get its output. at ClhsdbFindPC.testFindPC(ClhsdbFindPC.java:306) at ClhsdbFindPC.main(ClhsdbFindPC.java:328) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:577) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) at java.base/java.lang.Thread.run(Thread.java:828) Caused by: java.lang.RuntimeException: Process is still alive. Can't get its output. at jdk.test.lib.apps.LingeredApp.getOutput(LingeredApp.java:171) at jdk.test.lib.apps.LingeredApp.finishApp(LingeredApp.java:388) at jdk.test.lib.apps.LingeredApp.startAppExactJvmOpts(LingeredApp.java:450) at jdk.test.lib.apps.LingeredApp.startApp(LingeredApp.java:464) at ClhsdbFindPC.testFindPC(ClhsdbFindPC.java:98) ... 5 more JavaTest Message: Test threw exception: java.lang.RuntimeException: Test ERROR java.lang.RuntimeException: Process is still alive. Can't get its output. JavaTest Message: shutting down test STATUS:Failed.`main' threw exception: java.lang.RuntimeException: Test ERROR java.lang.RuntimeException: Process is still alive. Can't get its output. ----------rerun:(40/6040)*----------
25-03-2022

We are seeing a group of this failure in every tier3 run AFAICS lately.
08-06-2021

Hard to tell that this is really a different issue to all the other macos issue we are having. A lot of links to this in the CI now.
06-06-2021

[~dholmes] > So isn't this just the general timeout problem as reported by JDK-8265350? Yes. Core dumps are taking too long.
12-05-2021

So isn't this just the general timeout problem as reported by JDK-8265350?
12-05-2021

"Process is still alive. Can't get its output." is not the reason the test is failing. It has already failed and produced this message while trying to shutdown after the failure: public static void startAppExactJvmOpts(LingeredApp theApp, String... jvmOpts) throws IOException { theApp.createLock(); try { theApp.runAppExactJvmOpts(jvmOpts); theApp.waitAppReady(); } catch (Exception ex) { System.out.println("LingeredApp failed to start: " + ex); theApp.finishApp(); <-- this line is producing the "Process is still alive. Can't get its output. " message theApp.deleteLock(); throw ex; } } In the output you first see: LingeredApp failed to start: java.io.IOException: App waiting timeout This is happening during the theApp.waitAppReady() call. Since this is a test the wants a core file generated, waitAppReady() is waiting for the process to no longer be alive (due to the forced crash): // Make sure process didn't already exit if (!appProcess.isAlive()) { if (forceCrash) { return; // This is expected. Just return. } else { throw new IOException("App exited unexpectedly with " + appProcess.exitValue()); } } So this not happening, and the test times out after 1h20m. However, looking at the artifacts, there is an hs_err and core file for the crash as expected. So my guess is this is yet again the OSX issue with core file dumps taking way too long. When this happens, the test is not allowed to timeout until after the core file is produced, so it waits indefinitely for this to happen, and then times out rather than complete the test. We've had issues like this before. See JDK-8214797, JDK-8258512, and JDK-8253028. The last one is the most pertinent, and was fixed by giving the tests a longer timeout. However, we have to have some limit on how high we are willing to go with the timeouts. I'm not so sure we should be increasing it anymore than we already have, and instead should just tolerate these occasional timeouts.
13-04-2021