JDK-8239062 : jhsdb jstack does not work with debugd server on OSX
  • Type: Bug
  • Component: hotspot
  • Sub-Component: svc-agent
  • Affected Version: 15
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • OS: os_x
  • CPU: x86_64
  • Submitted: 2020-02-14
  • Updated: 2021-07-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8239379 :  
Description
The easiest way to reproduce this is with serviceability/sa/sadebugd/DebugdConnectTest.java. It time out. However, if you look in the log the real issue is the following, which happens repeatedly:

java.lang.ClassCastException: class sun.jvm.hotspot.debugger.remote.RemoteDebuggerClient cannot be cast to class sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal (sun.jvm.hotspot.debugger.remote.RemoteDebuggerClient and sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal are in module jdk.hotspot.agent of loader 'app')
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.bsd_amd64.BsdAMD64JavaThreadPDAccess.getThreadProxy(BsdAMD64JavaThreadPDAccess.java:135)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.bsd_amd64.BsdAMD64JavaThreadPDAccess.getCurrentFrameGuess(BsdAMD64JavaThreadPDAccess.java:96)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.JavaThread.getCurrentFrameGuess(JavaThread.java:265)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.JavaThread.getLastJavaVFrameDbg(JavaThread.java:227)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:81)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.run(JStack.java:67)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:262)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:225)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.runWithArgs(JStack.java:90)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJSTACK(SALauncher.java:290)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:406)

The ClassCastException is the real issue, causing jstack to fail. However it's not clear why this results in a test timeout. The test properly detects the jstack failure and throws an exception. A finally block triggers kill the debugd process by calling debugd.detach(). Since debudg was attached to the LingeredApp, this should allow the LingeredApp to continue to run. Next the test calls LingeredApp.stopApp() and this is where stack traces show the test is stuck when the timeout happens. If debugd had not exited, then it would make sense that LingeredApp would not be able to exit (and can't run until debugd has detached from it), but this is not the case. I did try simiulating all this from the command line (running a simple app, attaching to it with debugd, and running jstack through the debugd to reproduce the same exception above). Once doing this, I exited debugd, and the app was responsive again.

You can also easily reproduce this failure from the command line. Start by running a simple java app. I used "jhsdb clhsdb" since it is convenient:

 ./jhsdb clhsdb

From another shell run debugd, and attach to the clhsdb process (use jps to get the pid):

./jhsdb debugd --serverid foo --pid 53938

And from a 3rd shell run jstack, connecting to the debugd server:

$ ./jhsdb jstack --connect foo@localhost
Attaching to remote server foo@localhost, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 15-internal+0-2020-01-23-2238326.cplummer...
Deadlock Detection:

No deadlocks found.

"main" #1 prio=5 tid=0x00007fede6008000 nid=0x1f03 runnable [0x0000700004200000]
   java.lang.Thread.State: RUNNABLE
   JavaThread state: _thread_in_native
Error occurred during stack walking:
java.lang.ClassCastException: class sun.jvm.hotspot.debugger.remote.RemoteDebuggerClient cannot be cast to class sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal (sun.jvm.hotspot.debugger.remote.RemoteDebuggerClient and sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal are in module jdk.hotspot.agent of loader 'app')
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.bsd_amd64.BsdAMD64JavaThreadPDAccess.getThreadProxy(BsdAMD64JavaThreadPDAccess.java:134)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.bsd_amd64.BsdAMD64JavaThreadPDAccess.getCurrentFrameGuess(BsdAMD64JavaThreadPDAccess.java:96)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.JavaThread.getCurrentFrameGuess(JavaThread.java:265)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.JavaThread.getLastJavaVFrameDbg(JavaThread.java:227)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:81)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.run(JStack.java:67)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:262)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:225)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.runWithArgs(JStack.java:90)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJSTACK(SALauncher.java:290)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:406)

...and a bunch more similar failures. Each thread will produce this backtrace.
Comments
Let's just keep this bug open in that case.
16-07-2020

reopening: serviceability/sa/sadebugd/DebugdConnectTest.java is still problem-listed due to 8239062, if the underlying product defect isn't deemed to be worth fixing, the test should be updated not to be run on macos (either by @requires or custom code in the test) and removed from the problem-list.
16-07-2020

Won't fix since this is only an issue with debugd and only on macosx (which already has other attach restrictions that limit its usefulness), and the workaround is to simply bypass debugd and run SA on the process directly.
06-03-2020

I attempted to fix this issue by implementing the single arg BsdDebuggerLocal.getThreadForIdentifierAddress() (which is currently unimplemented) to call the two arg version. This required copying much of the code currently in BsdAMD64JavaThreadPDAccess.getThreadProxy() in order to get the arguments. This didn't go so well. The first problem is that there is no easy access to the TypeDataBase, so I called VM.getVM().getTypeDataBase(). This failed with RuntimeException("VM.initialize() was not yet called"). This is normally done in HotSpotAgent.setupVM(), but is intentionally not done on the debugd (server) side: if (!isServer) { // Do not initialize the VM on the server (unnecessary, since it's // instantiated on the client) try { VM.initialize(db, debugger); Although probably there is good reason not to initialize on the sever side (although I'm not sure why), I changed this code to initialize unconditionally. At least then I could get the TypeDataBase and make some more progress on the bug. That led to the next issue. The following code produces a null value, which leads to an NPE: Address osThreadAddr = osThreadField.getValue(addr); I'm not sure why this works fine on the client side but not on the RemoteDebuggerServer side. In any case, fixing this issue will take far more understanding than I currently have of SA. Probably the fix for JDK-8006423 needs to be done further up the call chain, such as in getThreadIntegerRegisterSet(), rather than in getThreadProxy(). Changing to a p4 since this is only an issue with debugd and only on macosx. The workaround is to simply bypass debugd and run SA on the process directly.
18-02-2020

My conclusion here is that JDK-8006423 was not properly implemented for remote debugging (using the SA debugd server). This appears to be the only test that does that. My last idea above is kind of on the right track. If we are dealing with a RemoteDebuggerClient instead of a BsdDebuggerLocal, then just make the normal getThreadForIdentifierAddress() call from BsdAMD64JavaThreadPDAccess.getThreadProxy(). That gets past the ClassCastException, although I'm unsure if it doesn't expose you to the NPE that JDK-8006423 fixes. In any case, you quickly run into another problem: java.lang.RuntimeException: unimplemented at jdk.hotspot.agent/sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal.getThreadForIdentifierAddress(BsdDebuggerLocal.java:423) at jdk.hotspot.agent/sun.jvm.hotspot.debugger.remote.RemoteDebuggerServer.getThreadProxy(RemoteDebuggerServer.java:167) at jdk.hotspot.agent/sun.jvm.hotspot.debugger.remote.RemoteDebuggerServer.getThreadIntegerRegisterSet(RemoteDebuggerServer.java:155) RemoteDebuggerServer.getThreadIntegerRegisterSet() is an RMi call. The other side of the RMI call looks like this (note that RemoteDebuggerServer in the above stack trace is the sadebugd side, directly accessing the jvm, and RemoteDebuggerClient below is the side running jstack, but having it do so by using the remote sadebugd) at jdk.hotspot.agent/sun.jvm.hotspot.debugger.remote.RemoteDebuggerClient.getThreadIntegerRegisterSet(RemoteDebuggerClient.java:120) at jdk.hotspot.agent/sun.jvm.hotspot.debugger.remote.amd64.RemoteAMD64Thread.getContext(RemoteAMD64Thread.java:43) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.bsd_amd64.BsdAMD64JavaThreadPDAccess.getCurrentFrameGuess(BsdAMD64JavaThreadPDAccess.java:97) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.JavaThread.getCurrentFrameGuess(JavaThread.java:265) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.JavaThread.getLastJavaVFrameDbg(JavaThread.java:227) at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:81) at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45) at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.run(JStack.java:67) The "unimplemented" exception is part of the fix for JDK-8006423. It intentionally made the single argument BsdDebuggerLocal.getThreadForIdentifierAddress() unimplemented because you must call the two argument version to avoid the NPE fixed by JDK-8006423. BsdAMD64JavaThreadPDAccess().getThreadProxy was fixed to do this, but RemoteDebuggerServer.getThreadProxy() was not. And since this fix is specific to Bsd, it really can't be put in RemoteDebuggerServer.getThreadProxy().
14-02-2020

The hierarchy of the "debugger" variable is: class sun.jvm.hotspot.debugger.remote.RemoteDebuggerClient implements sun.jvm.hotspot.debugger.JVMDebugger extends class sun.jvm.hotspot.debugger.DebuggerBase implements sun.jvm.hotspot.debugger.Debugger implements sun.jvm.hotspot.debugger.ThreadAccess extends class java.lang.Object The hierarchy of the BsdDebuggerLocal class it is being cast too is: class sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal implements sun.jvm.hotspot.debugger.bsd.BsdDebugger implements sun.jvm.hotspot.debugger.JVMDebugger extends sun.jvm.hotspot.debugger.DebuggerBase implements sun.jvm.hotspot.debugger.Debugger implements sun.jvm.hotspot.debugger.ThreadAccess So both of these classes extend DebuggerBase (and implement Debugger), and getThreadForIdentifierAddress(threadIdAddr) is declared in the Debugger interface via the ThreadAccess interface. Howver getThreadForIdentifierAddress(threadIdAddr, uniqueThreadIdAddr) is not, and is only declared in BsdDebuggerLocal. The following comment in BsdDebuggerLocal.java is not correct. /** From the ThreadAccess interface via Debugger and JVMDebugger */ public ThreadProxy getThreadForIdentifierAddress(Address threadIdAddr, Address uniqueThreadIdAddr) { return new BsdThread(this, threadIdAddr, uniqueThreadIdAddr); } @Override public ThreadProxy getThreadForIdentifierAddress(Address addr) { throw new RuntimeException("unimplemented"); } The comment belongs on the second getThreadForIdentifierAddress() API. The first one is actually a declaration, and is only present in BsdDebuggerLocal. So this explains why the cast is needed. I'm not sure how this ever worked. Maybe VM.getVM().getDebugger() has changed or something else has changed in the hierarchy of these classes. I'm a little unclear of the remote vs local debugger relationship here. Seems hard to believe that the original getThreadForIdentifierAddress(threadIdAddr) that acts on a remote Debugger instance can be changed to a getThreadForIdentifierAddress(threadIdAddr, uniqueThreadIdAddr) that acts on a local one. I'm now wondering if the VM.getVM().getDebugger() call can return a local or a remote Debugger, depending on which side is executing the code, and the fix for JDK-8006423 is only handling the local case. It might be that for the remote case we can use the old implementation because presumably the getThreadForIdentifierAddress() request gets forwarded to the local Debugger, which does the right thing.
14-02-2020

The BsdAMD64JavaThreadPDAccess.getThreadProxy() code in question is different than the linux version due to the fix for JDK-8006423, which unfortunately requires a large amount of background reading to understand the cause and fix for the bug. The changes however introduced the cast that is failing here: https://hg.openjdk.java.net/jdk/jdk/rev/860cf6c70c06 @@ -125,8 +129,9 @@ Address osThreadAddr = osThreadField.getValue(addr); // Get the address of the _thread_id from the OSThread Address threadIdAddr = osThreadAddr.addOffsetTo(osThreadThreadIDField.getOffset()); + Address uniqueThreadIdAddr = osThreadAddr.addOffsetTo(osThreadUniqueThreadIDField.getOffset()); - JVMDebugger debugger = VM.getVM().getDebugger(); - return debugger.getThreadForIdentifierAddress(threadIdAddr); + BsdDebuggerLocal debugger = (BsdDebuggerLocal) VM.getVM().getDebugger(); + return debugger.getThreadForIdentifierAddress(threadIdAddr, uniqueThreadIdAddr);
14-02-2020