JDK-8224252 : [TESTBUG] hotspot/test/serviceability/sa/sadebugd/SADebugDTest.java is timing out again after fix for JDK-8163805
  • Type: Bug
  • Component: hotspot
  • Sub-Component: svc-agent
  • Affected Version: 13
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2019-05-20
  • Updated: 2020-07-02
  • Resolved: 2019-05-25
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 13
11.0.5Fixed 13 b23Fixed
Related Reports
Relates :  
Relates :  
Description
JDK-8163805 was suppose to fix a timeout issue with this test and also removed it off the problemlist. However, it still times out on windows-x64 about 1/3 of the time. It passes every time on all the other platforms. When it passes is just takes a few seconds. The timeout takes a little over 23 minutes. The only useful output in the log is:

----------System.out:(3/210)*----------
Started LingeredApp with pid 35076
[debugd]:c:\\ade\\mesos\\work_dir\\jib-master\\install\\jdk-13-1074\\windows-x64-debug.jdk\\jdk-13\\fastdebug\\bin\\jhsdb.exe debugd --pid 35076
Timeout refired 1200 times
----------System.err:(2/567)*----------
Command line: ['c:\\ade\\mesos\\work_dir\\jib-master\\install\\jdk-13-1074\\windows-x64-debug.jdk\\jdk-13\\fastdebug/bin/java' '-XX:MaxRAMPercentage=8' '-cp' 'T:\\testoutput\\test-support\\jtreg_open_test_hotspot_jtreg_hotspot_serviceability\\classes\\2\\serviceability\\sa\\sadebugd\\SADebugDTest.d;T:\\testoutput\\test-support\\jtreg_open_test_hotspot_jtreg_hotspot_serviceability\\classes\\2\\test\\lib' 'jdk.test.lib.apps.LingeredApp' 'cd3fc62e-5d57-4c6f-867d-0576f5ee9bcb.lck' ]
[debugd] Attaching to process ID 35076 and starting RMI services, please wait...
Comments
I think I ran into a similar problem with OSX, although I disabled it with logic in the test rather than with @requires: if (SATestUtils.needsPrivileges()) { // This tests has issues if you try adding privileges on OSX. The debugd process cannot // be killed if you do this (because it is a root process and the test is not), so the destroy() // call fails to do anything, and then waitFor() will time out. If you try to manually kill it with // a "sudo kill" command, that seems to work, but then leaves the LingeredApp it was // attached to in a stuck state for some unknown reason, causing the stopApp() call // to timeout. For that reason we don't run this test when privileges are needed. Note // it does appear to run fine as root, so we still allow it to run on OSX when privileges // are not required. throw new SkippedException("Cannot run this test on OSX if adding privileges is required."); }
02-07-2020

Fix Request for 11u: This is a test bug for Windows. I want to backport this to 11u together with JDK-8223665 and JDK-8163805.
25-05-2019

Suggested fix: http://hg.openjdk.java.net/jdk/submit/rev/4a7488be8277 It has passed jtreg tests on submit repo (mach5-one-ysuenaga-JDK-8224252-3-20190523-0532-2647154) I believe this change would pass tests on Windows with -Xcomp.
23-05-2019

Root cause of this is `jhsdb debugd` could not be exited normally. `jhsdb debugd` (internally, DebugServer.java) would detach from debuggee at shutdown hook. So jhsdb process need to be exited normally. `jhsdb debugd` could not exit itself without external operation (e.g. CTRL+C, signals). Thus we use Process::destroy for it. In the case of Windows, Process::destroy calls TerminateProcess() Windows API to terminate process. However it would terminate target process immediately and it would not give a chance to kick shutdown hook. (It is documented on Javadoc of Runtime::addShutdownHook) I tried to use `taskkill` command to kill jhsdb if the testcase runs on Windows [1] , but it failed. `taskkill` without /f (force) seems to fail if target process does not have main window. Therefore, I think this testcase (SADebugDTest) should be excluded for Windows. ProblemList.txt: ``` + serviceability/sa/sadebugd/SADebugDTest.java 8224252 windows-all ``` or add @requires to SADebugDTest.java: ``` @requires os.family != "windows" ``` [1] http://hg.openjdk.java.net/jdk/submit/rev/d80f19c8be80
23-05-2019

It's probably best to problemlist if you don't think you'll have a solution soon.
22-05-2019

I tried to fix this issue in JDK-8224252 branch on submit repo, and I could reproduce similar issue on my laptop (Win 10 Pro, jtreg on WSL). LingeredApp process seems to hang. I checked process status with Process Explorer, Its status is "Suspended". Also I could resume it on Process Explorer, and it could exit.
21-05-2019

[~cjplummer] Thank you for clarification! I guess this cause is debugd was terminated before shutdown hook for detach is registered. SADebugDTest uses "Attaching to process" for checking the result of operation. If debugd would be terminated between `agent.startServer()` and `untime.getRuntime().addShutdownHook()`, debug session might finish incompletely. http://hg.openjdk.java.net/jdk/jdk/file/566fbca8a890/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/DebugServer.java#l102 In the past, I tried to use "Debugger attached" for checking, but it was failed on Windows x64. ProcessTools::startProcess uses StreamPumper for line check. It might not to handle well CRLF. If we can change StreamPumper, test might succeed. But I'm not sure how that change affects other jtreg tests. http://hg.openjdk.java.net/jdk/submit/rev/74b8aca4123d
21-05-2019

Here's the main app's stacktrace (I think this is the only thread that matters): "MainThread" #12 prio=5 os_prio=0 cpu=765.63ms elapsed=1322.01s tid=0x0000003e17427800 nid=0x78c4 runnable [0x0000003e191ae000] java.lang.Thread.State: RUNNABLE Thread: 0x0000003e17427800 [0x78c4] State: _at_safepoint _at_poll_safepoint 0 JavaThread state: _thread_in_native at java.lang.ProcessImpl.waitForInterruptibly(java.base@13-ea/Native Method) at java.lang.ProcessImpl.waitFor(java.base@13-ea/ProcessImpl.java:489) at jdk.test.lib.apps.LingeredApp.waitAppTerminate(LingeredApp.java:237) at jdk.test.lib.apps.LingeredApp.stopApp(LingeredApp.java:382) at jdk.test.lib.apps.LingeredApp.stopApp(LingeredApp.java:445) at SADebugDTest.main(SADebugDTest.java:67) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@13-ea/Native Method) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@13-ea/NativeMethodAccessorImpl.java:62) at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@13-ea/DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(java.base@13-ea/Method.java:567) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) at java.lang.Thread.run(java.base@13-ea/Thread.java:830) So it looks like the test is trying to exit, and it blocked doing the LingeredApp.stopApp(). As for the spawned LingeredApp, the .jtr file I was looking at says it is pid 35076. When I look at the jstack output for it, I see: Exception in thread "main" java.lang.InternalError: Remote thread failed for unknown reason (101) at jdk.attach/sun.tools.attach.VirtualMachineImpl.enqueue(Native Method) at jdk.attach/sun.tools.attach.VirtualMachineImpl.execute(VirtualMachineImpl.java:106) at jdk.attach/sun.tools.attach.HotSpotVirtualMachine.executeCommand(HotSpotVirtualMachine.java:309) at jdk.attach/sun.tools.attach.HotSpotVirtualMachine.remoteDataDump(HotSpotVirtualMachine.java:266) at jdk.jcmd/sun.tools.jstack.JStack.runThreadDump(JStack.java:130) at jdk.jcmd/sun.tools.jstack.JStack.main(JStack.java:109) This looked kind of familiar. Then I remembered the bug I worked on that added the error code to this message, which is JDK-8205719, and that was done to help debug what was going on with JDK-8199811. For that bug we were getting error code 100, which is ATTACH_ERROR_DISABLED, and was caused by attaching too early. For this bug we are getting 101, which is ATTACH_ERROR_RESOURCE. It looks like this must be the result of there being no Win32AttachOperations available in the queue. I'm not sure if it was just never initialized, or if the queue was drained and not replenished.
21-05-2019

My guess is -Xcomp is causing startup to take a bit longer, possibly resulting in trying to attach before it is ready. I'll see if there are thread dumps available.
21-05-2019

Should we add SADebugDTest to ProblemList-Xcomp.txt? It seems not to be initialized RMI server because other tests on SA are passed. We need thread dump when debugd is hanged for more investigation.
21-05-2019

Here are the options being used for one of the failures: -Xcomp -XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation -XX:+IgnoreUnrecognizedVMOptions -XX:-UseCompressedOops -Xcomp seems to be the common theme with all the failures. I'm not so sure the rest matter.
20-05-2019