JDK-8285729 : serviceability/sa/ClhsdbThreadContext.java timed out
  • Type: Bug
  • Component: hotspot
  • Sub-Component: svc-agent
  • Affected Version: 19,20,21,23,24
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • OS: windows
  • CPU: x86_64
  • Submitted: 2022-04-27
  • Updated: 2024-09-22
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Relates :  
Description
The following test timed out in the JDK19 CI:

serviceability/sa/ClhsdbThreadContext.java

Here's a log file snippet:

#section:main
----------messages:(6/226)----------
command: main ClhsdbThreadContext
reason: User specified action: run main/othervm ClhsdbThreadContext 
Mode: othervm [/othervm specified]
Timeout information:
--- Timeout information end.
elapsed time (seconds): 541.679
----------configuration:(0/0)----------
----------System.out:(1127/56676)*----------
[0.048s][info][cds] trying to map c:\\ade\\mesos\\work_dir\\jib-master\\install\\jdk-19+20-1365\\windows-x64-debug.jdk\\jdk-19\\fastdebug\\bin\\server\\classes.jsa
[0.048s][info][cds] Opened archive c:\\ade\\mesos\\work_dir\\jib-master\\install\\jdk-19+20-1365\\windows-x64-debug.jdk\\jdk-19\\fastdebug\\bin\\server\\classes.jsa.
[0.048s][info][cds] Archive was created with UseCompressedOops = 1, UseCompressedClassPointers = 1
[0.048s][info][cds] full module graph: disabled because archive was created without full module graph

<snip>

hsdb> + quit

 LingeredApp stdout: [[0.040s][info][cds] trying to map c:\\ade\\mesos\\work_dir\\jib-master\\install\\jdk-19+20-1365\\windows-x64-debug.jdk\\jdk-19\\fastdebug\\bin\\server\\classes.jsa
[0.040s][info][cds] Opened archive c:\\ade\\mesos\\work_dir\\jib-master\\install\\jdk-19+20-1365\\windows-x64-debug.jdk\\jdk-19\\fastdebug\\bin\\server\\classes.jsa.
[0.040s][info][cds] Archive was created with UseCompressedOops = 1, UseCompressedClassPointers = 1
[0.040s][info][cds] full module graph: disabled because archive was created without full module graph
[0.040s][info][cds] Core region alignment: 65536
[0.040s][info][cds] Reserved archive_space_rs [0x0000000800000000 - 0x0000000800c30000] (12779520) bytes
[0.040s][info][cds] Reserved class_space_rs   [0x0000000801000000 - 0x0000000841000000] (1073741824) bytes
[0.040s][info][cds] Windows mmap workaround: releasing archive space.
[0.040s][info][cds] Mapped static  region #0 at base 0x0000000800000000 top 0x00000008004c0000 (ReadWrite)
[0.040s][info][cds] Mapped static  region #1 at base 0x00000008004c0000 top 0x0000000800c30000 (ReadOnly)
[0.040s][info][cds] ArchiveRelocationMode == 1: always map archive(s) at an alternative address
[0.040s][info][cds] Unmapping region #0 at base 0x0000000800000000 (ReadWrite)
[0.041s][info][cds] Unmapping region #1 at base 0x00000008004c0000 (ReadOnly)
[0.041s][info][cds] Try to map archive(s) at an alternative address
[0.041s][info][cds] Reserved archive_space_rs [0x000001ed2e400000 - 0x000001ed2f400000] (16777216) bytes
[0.041s][info][cds] Reserved class_space_rs   [0x000001ed2f400000 - 0x000001ed6f400000] (1073741824) bytes
[0.041s][info][cds] Commit static  region #0 at base 0x000001ed2e400000 top 0x000001ed2e8c0000 (ReadWrite)
[0.043s][info][cds] Mapped static  region #0 at base 0x000001ed2e400000 top 0x000001ed2e8c0000 (ReadWrite)
[0.043s][info][cds] Commit static  region #1 at base 0x000001ed2e8c0000 top 0x000001ed2f030000 (ReadOnly)
[0.048s][info][cds] Mapped static  region #1 at base 0x000001ed2e8c0000 top 0x000001ed2f030000 (ReadOnly)
[0.048s][info][cds] Mapped static  region #2 at base 0x000001ed2e170000 top 0x000001ed2e1b0000 (Bitmap)
[0.053s][info][cds] optimized module handling: enabled
[0.053s][info][cds] full module graph: disabled
[0.054s][info][cds] Unmapping region #2 at base 0x000001ed2e170000 (Bitmap)
];
 LingeredApp stderr: []
 LingeredApp exitValue = 0
Test PASSED
----------System.err:(1/16)----------
STATUS:Passed.
----------rerun:(41/5700)*----------

<snip>

result: Error. Program `c:\ade\mesos\work_dir\jib-master\install\jdk-19+20-1365\windows-x64-debug.jdk\jdk-19\fastdebug\bin\java' timed out (timeout set to 480000ms, elapsed time including timeout handling was 541679ms).


The test timed out at 480 seconds, but passed while the timeout
handler was running.
Comments
Looks like this is the same issue as JDK-8313814 "Test serviceability/sa/ClhsdbSymbol.java timed out after passing". In both cases we are getting stuck for a long period of time in a native Windows API called by WindbgDebuggerLocal.lookupByName0(). We'll keep both issues open for easier bookkeeping for the two different tests that are failing.
27-07-2024

In an earlier analysis above, I noted that the process was stuck in at WindbgDebuggerLocal.lookupByName0(), and I didn't understand how it could be stuck here when the output indicates that the test ran to completion. I now realize that this stack dump was made at the time of the timeout, but then the test was allowed to continue to run after that and completed. This seems to be the same issue as the JDK-8313814 serviceability/sa/ClhsdbSymbol.java. For some reason we get stuck in the Windows native API that is called by WindbgDebuggerLocal.lookupByName0(), but eventually we get unstuck and the test completes. However, the timeout fired while we were stuck, so jtreg failed the test run. I have no idea why we are getting stuck in the Windows native API.
07-05-2024

I don't think the LingeredApp stdout error output is triggered by the test. It does not issue any jcmds. I think it is the from the timeout handler trying to issue a jcmd. Also, this failure seems to be a little different than the original failure in that the jhsdb process that supposedly timed out is no longer present.
30-11-2022

Here's a log file snippet from the jdk-20+20-1442-tier1 sighting: serviceability/sa/ClhsdbThreadContext.java #section:main ----------messages:(8/305)---------- command: main ClhsdbThreadContext reason: User specified action: run main/othervm ClhsdbThreadContext started: Tue Oct 18 15:44:12 UTC 2022 Mode: othervm [/othervm specified] Timeout information: --- Timeout information end. finished: Tue Oct 18 15:57:25 UTC 2022 elapsed time (seconds): 793.634 ----------configuration:(0/0)---------- ----------System.out:(938/42335)*---------- Starting ClhsdbThreadContext test Command line: ['c:\\ade\\mesos\\work_dir\\jib-master\\install\\jdk-20+20-1442\\windows-x64-debug.jdk\\jdk-20\\fastdebug\\bin\\java.exe' '-XX:MaxRAMPercentage=4.16667' '-Djava.io.tmpdir=c:\\sb\\prod\\1666107461\\testoutput\\test-support\\jtreg_open_test_hotspot_jtreg_tier1_serviceability\\tmp' '-cp' 'C:\\sb\\prod\\1666107461\\testoutput\\test-support\\jtreg_open_test_hotspot_jtreg_tier1_serviceability\\classes\\4\\serviceability\\sa\\ClhsdbThreadContext.d;C:\\sb\\prod\\1666107461\\testoutput\\test-support\\jtreg_open_test_hotspot_jtreg_tier1_serviceability\\classes\\4\\test\\lib' 'jdk.test.lib.apps.LingeredApp' '7d1cd51b-61de-4784-ae8f-b987286a373b.lck'] Started LingeredApp with pid 43772 Starting clhsdb against 43772 [2022-10-18T15:44:13.976482700Z] Gathering output for process 28836 [2022-10-18T15:44:27.979924300Z] Waiting for completion for process 28836 [2022-10-18T15:44:27.979924300Z] Waiting for completion finished for process 28836 Output: Attaching to process 43772, please wait... <snip> "SteadyStateThread" #21 prio=5 tid=0x0000027574002c20 nid=16464 waiting for monitor entry [0x000000f5504ff000] java.lang.Thread.State: BLOCKED (on object monitor) JavaThread state: _thread_blocked ss: 0x000000000000002b fs: 0x0000000000000053 gs: 0x000000000000002b es: 0x000000000000002b ds: 0x000000000000002b fsbase: null gsbase: null hsdb> + quit LingeredApp stdout: [[756.045s][error][attach] could not open (2) pipe \\\\.\\pipe\\javatool-662874919 to send result of operation jcmd [756.045s][error][attach] could not open (2) pipe \\\\.\\pipe\\javatool-574680134 to send result of operation jcmd [756.046s][error][attach] could not open (2) pipe \\\\.\\pipe\\javatool-1856630546 to send result of operation jcmd ]; LingeredApp stderr: [] LingeredApp exitValue = 0 Test PASSED ----------System.err:(1/16)---------- STATUS:Passed. ----------rerun:(36/5435)*---------- The test PASSED while the timeout handler was running. However, there's also what appears to be error msgs right at the end: hsdb> + quit LingeredApp stdout: [[756.045s][error][attach] could not open (2) pipe \\\\.\\pipe\\javatool-662874919 to send result of operation jcmd [756.045s][error][attach] could not open (2) pipe \\\\.\\pipe\\javatool-574680134 to send result of operation jcmd [756.046s][error][attach] could not open (2) pipe \\\\.\\pipe\\javatool-1856630546 to send result of operation jcmd ]; LingeredApp stderr: [] LingeredApp exitValue = 0 Test PASSED ----------System.err:(1/16)----------
18-10-2022

The test runs "threadcontext -a" and "threadcontext -a -v", and then gets a <threadId> from second threadcontext comand and then runs "threadcontext -v <threadId>. In the output for the 3rd threadcontext command I see: Starting clhsdb against 33216 [2022-04-27T08:40:37.333220800Z] Gathering output for process 9380 Timeout refired 480 times [2022-04-27T08:48:58.287672200Z] Waiting for completion for process 9380 [2022-04-27T08:48:58.287672200Z] Waiting for completion finished for process 9380 So there is a timeout during it, but it eventually completes and the test "passes". However, jtreg reports it as failed because of the timeout. I'm not sure why it took so long. "threadcontext -v <threadId>" produces the same output as "threadcontext -a -v", except does so for only one thread, so it should be a lot faster (and is the first of these two commands is only taking a few seconds). However, even though the output shows that all the expected output for "threadcontext -v <threadId>" is there, it looks like the process is still running and is stuck here: "main" #1 prio=5 os_prio=0 cpu=472171.88ms elapsed=474.04s allocated=79911K defined_classes=481 tid=0x0000011ad9950f50 nid=2416 runnable [0x0000000a5f7fe000] java.lang.Thread.State: RUNNABLE Thread: 0x0000011ad9950f50 [0x970] State: _at_safepoint _at_poll_safepoint 0 JavaThread state: _thread_in_native at sun.jvm.hotspot.debugger.windbg.WindbgDebuggerLocal.lookupByName0(jdk.hotspot.agent@19-ea/Native Method) at sun.jvm.hotspot.debugger.windbg.WindbgDebuggerLocal.lookupByName(jdk.hotspot.agent@19-ea/WindbgDebuggerLocal.java:464) - locked <0x0000000080a30338> (a sun.jvm.hotspot.debugger.windbg.WindbgDebuggerLocal) at sun.jvm.hotspot.debugger.windbg.WindbgDebuggerLocal.lookup(jdk.hotspot.agent@19-ea/WindbgDebuggerLocal.java:242) - locked <0x0000000080a30338> (a sun.jvm.hotspot.debugger.windbg.WindbgDebuggerLocal) at sun.jvm.hotspot.types.basic.BasicVtblAccess.getVtblForType(jdk.hotspot.agent@19-ea/BasicVtblAccess.java:63) at sun.jvm.hotspot.types.basic.BasicTypeDataBase.vtblForType(jdk.hotspot.agent@19-ea/BasicTypeDataBase.java:164) at sun.jvm.hotspot.types.basic.BasicTypeDataBase.addressTypeIsEqualToType(jdk.hotspot.agent@19-ea/BasicTypeDataBase.java:180) at sun.jvm.hotspot.types.basic.BasicTypeDataBase.guessTypeForAddress(jdk.hotspot.agent@19-ea/BasicTypeDataBase.java:273) at sun.jvm.hotspot.utilities.PointerFinder.find(jdk.hotspot.agent@19-ea/PointerFinder.java:54) at sun.jvm.hotspot.runtime.JavaThread.printThreadContextOn(jdk.hotspot.agent@19-ea/JavaThread.java:493) at sun.jvm.hotspot.CommandProcessor$46.doit(jdk.hotspot.agent@19-ea/CommandProcessor.java:1699) at sun.jvm.hotspot.CommandProcessor.executeCommand(jdk.hotspot.agent@19-ea/CommandProcessor.java:2212) at sun.jvm.hotspot.CommandProcessor.executeCommand(jdk.hotspot.agent@19-ea/CommandProcessor.java:2182) at sun.jvm.hotspot.CommandProcessor.run(jdk.hotspot.agent@19-ea/CommandProcessor.java:2053) at sun.jvm.hotspot.CLHSDB.run(jdk.hotspot.agent@19-ea/CLHSDB.java:112) at sun.jvm.hotspot.CLHSDB.main(jdk.hotspot.agent@19-ea/CLHSDB.java:44) at sun.jvm.hotspot.SALauncher.runCLHSDB(jdk.hotspot.agent@19-ea/SALauncher.java:281) at sun.jvm.hotspot.SALauncher$$Lambda$1/0x0000011a81001800.accept(jdk.hotspot.agent@19-ea/Unknown Source) at sun.jvm.hotspot.SALauncher.main(jdk.hotspot.agent@19-ea/SALauncher.java:500)' All printThreadContextOn() does it iterate over all the registers, and for each it prints the register name and value. If the value is not null then it also calls PointerFinder(find) on it to get more info to print out. However, the output shows it's already printed the following: fsbase: null gsbase: null These are the expected last two registers to be printed, and since gsbase is null, it can't be calling PointerFinder.find() on it. So it's not making sense that this process is stuck here.
27-04-2022