While working on JDK-8244383 I modified JShellHeapDumpTest to check for DebuggerException. I then started noticing the following failure on occasion:
Error: exception occurred during stack walking:
sun.jvm.hotspot.debugger.DebuggerException: sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp
at jdk.hotspot.agent/sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$LinuxDebuggerLocalWorkerThread.execute(LinuxDebuggerLocal.java:189)
at jdk.hotspot.agent/sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.getThreadIntegerRegisterSet(LinuxDebuggerLocal.java:551)
at jdk.hotspot.agent/sun.jvm.hotspot.debugger.linux.LinuxThread.getContext(LinuxThread.java:74)
at jdk.hotspot.agent/sun.jvm.hotspot.runtime.linux_amd64.LinuxAMD64JavaThreadPDAccess.getCurrentFrameGuess(LinuxAMD64JavaThreadPDAccess.java:95)
at jdk.hotspot.agent/sun.jvm.hotspot.runtime.JavaThread.getCurrentFrameGuess(JavaThread.java:267)
at jdk.hotspot.agent/sun.jvm.hotspot.runtime.JavaThread.getLastJavaVFrameDbg(JavaThread.java:229)
at jdk.hotspot.agent/sun.jvm.hotspot.runtime.ThreadStackTrace.dumpStack(ThreadStackTrace.java:54)
at jdk.hotspot.agent/sun.jvm.hotspot.utilities.HeapHprofBinWriter.dumpStackTraces(HeapHprofBinWriter.java:725)
at jdk.hotspot.agent/sun.jvm.hotspot.utilities.HeapHprofBinWriter.write(HeapHprofBinWriter.java:446)
at jdk.hotspot.agent/sun.jvm.hotspot.tools.JMap.writeHeapHprofBin(JMap.java:182)
at jdk.hotspot.agent/sun.jvm.hotspot.tools.JMap.run(JMap.java:97)
at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:262)
at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:225)
at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
at jdk.hotspot.agent/sun.jvm.hotspot.tools.JMap.main(JMap.java:176)
at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJMAP(SALauncher.java:331)
at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:483)
Caused by: sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp
at jdk.hotspot.agent/sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.getThreadIntegerRegisterSet0(Native Method)
at jdk.hotspot.agent/sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$1GetThreadIntegerRegisterSetTask.doit(LinuxDebuggerLocal.java:545)
at jdk.hotspot.agent/sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$LinuxDebuggerLocalWorkerThread.run(LinuxDebuggerLocal.java:164)
When I added some more debugging in the ps_proc.c I saw:
ptrace(PTRACE_GETREGS, ...) failed for lwp(11896) errno(3) "No such process"
There is a similar issue on windows also. My conclusion is that the JavaThread is likely in the process of being destroyed, and the native thread was already freed. SA should swallow this exception so tests don't need to deal with it. The easiest solution is to not throw an Exception, and have LinuxThread.getContext() (and similar calls for other platforms) deal with not being able to get the register set. It seems to be the only user of getThreadIntegerRegisterSet(), and requires fewer places to fix than fixing in getThreadIntegerRegisterSet() itself. Note the only user of the registers is stack walking code, and it will end up getting null for sp, fp, and pc, and "fail" gracefully when it does.
Also notice in the above output that you see "sun.jvm.hotspot.debugger.DebuggerException" as part of the message of the exception, so it appears twice. It turns out if you pass a cause exception to the exception constructor, by default the message of the new exception will be the name of the cause exception class followed by its message. Since we know the cause is also a DebuggerException, we should avoid having the class name be part of the exception message. This can be done by using a constructor that explicitly specifies the exception message. The changes are in LinuxDebuggerLocalWorkerThread.execute() in LinuxDebuggerLocal.java to make it use the right constructor for the rethrown exception, and similar in BsdDebuggerLocal. This code is used by getThreadIntegerRegisterSet().
Also to be fixed with the CR is updating the error message to properly reference "get_lwp_regs" instead of "get_thread_regs", and improving some ptrace_debug() output for this failure.