JDK-8247533 : SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp
  • Type: Bug
  • Component: hotspot
  • Sub-Component: svc-agent
  • Affected Version: 16
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2020-06-14
  • Updated: 2024-11-13
  • Resolved: 2020-07-02
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 16
16 b05Fixed
Related Reports
Blocks :  
Relates :  
Description
While working on JDK-8244383 I modified JShellHeapDumpTest to check for DebuggerException. I then started noticing the following failure on occasion:

Error: exception occurred during stack walking:
sun.jvm.hotspot.debugger.DebuggerException: sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp
	at jdk.hotspot.agent/sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$LinuxDebuggerLocalWorkerThread.execute(LinuxDebuggerLocal.java:189)
	at jdk.hotspot.agent/sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.getThreadIntegerRegisterSet(LinuxDebuggerLocal.java:551)
	at jdk.hotspot.agent/sun.jvm.hotspot.debugger.linux.LinuxThread.getContext(LinuxThread.java:74)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.linux_amd64.LinuxAMD64JavaThreadPDAccess.getCurrentFrameGuess(LinuxAMD64JavaThreadPDAccess.java:95)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.JavaThread.getCurrentFrameGuess(JavaThread.java:267)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.JavaThread.getLastJavaVFrameDbg(JavaThread.java:229)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.ThreadStackTrace.dumpStack(ThreadStackTrace.java:54)
	at jdk.hotspot.agent/sun.jvm.hotspot.utilities.HeapHprofBinWriter.dumpStackTraces(HeapHprofBinWriter.java:725)
	at jdk.hotspot.agent/sun.jvm.hotspot.utilities.HeapHprofBinWriter.write(HeapHprofBinWriter.java:446)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.JMap.writeHeapHprofBin(JMap.java:182)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.JMap.run(JMap.java:97)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:262)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:225)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.JMap.main(JMap.java:176)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJMAP(SALauncher.java:331)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:483)
Caused by: sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp
	at jdk.hotspot.agent/sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.getThreadIntegerRegisterSet0(Native Method)
	at jdk.hotspot.agent/sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$1GetThreadIntegerRegisterSetTask.doit(LinuxDebuggerLocal.java:545)
	at jdk.hotspot.agent/sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$LinuxDebuggerLocalWorkerThread.run(LinuxDebuggerLocal.java:164)

When I added some more debugging in the ps_proc.c I saw:

ptrace(PTRACE_GETREGS, ...) failed for lwp(11896) errno(3) "No such process"

There is a similar issue on windows also. My conclusion is that the JavaThread is likely in the process of being destroyed, and the native thread was already freed. SA should swallow this exception so tests don't need to deal with it. The easiest solution is to not throw an Exception, and have LinuxThread.getContext() (and similar calls for other platforms) deal with not being able to get the register set. It seems to be the only user of getThreadIntegerRegisterSet(), and requires fewer places to fix than fixing in getThreadIntegerRegisterSet() itself. Note the only user of the registers is stack walking code, and it will end up getting null for sp, fp, and pc, and "fail" gracefully when it does.

Also notice in the above output that you see "sun.jvm.hotspot.debugger.DebuggerException" as part of the message of the exception, so it appears twice. It turns out if you pass a cause exception to the exception constructor, by default the message of the new exception will be the name of the cause exception class followed by its message. Since we know the cause is also a DebuggerException, we should avoid having the class name  be part of the exception message. This can be done by using a constructor that explicitly specifies the exception message. The changes are in LinuxDebuggerLocalWorkerThread.execute() in LinuxDebuggerLocal.java to make it use the right constructor for the rethrown exception, and similar in BsdDebuggerLocal. This code is used by getThreadIntegerRegisterSet().

Also to be fixed with the CR is updating the error message to properly reference "get_lwp_regs" instead of "get_thread_regs", and improving some ptrace_debug() output for this failure.
Comments
URL: https://hg.openjdk.java.net/jdk/jdk/rev/fdfcdf562f0c User: cjplummer Date: 2020-07-02 20:15:42 +0000
02-07-2020

It turns out the root cause of these failures cannot be that the OS thread was freed while the JavaThread was still on the ThreadList since the executing JavaThread is responsible for removing itself from the list. That implies that any JavaThread on the list has an underlying valid OS thread. The root cause of the ESRCH failures on Linux and (very rare) 0x80004002 "no such interface" failures on Windows is not understood, but they appear to be spurious in nature. In any case, the suggested fix is still appropriate for any failure to get the registers. The end result should be a set of null registers, and this will cause the stack walking code to revert to using "last java frame" if available, and otherwise just not produce a stack trace.
19-06-2020

On Windows the failure looks like: sun.jvm.hotspot.debugger.DebuggerException: Windbg Error: GetThreadIdBySystemId failed! (hr: 0x80004002) at jdk.hotspot.agent/sun.jvm.hotspot.debugger.windbg.WindbgDebuggerLocal.getThreadIdFromSysId0(Native Method) at jdk.hotspot.agent/sun.jvm.hotspot.debugger.windbg.WindbgDebuggerLocal.getThreadIdFromSysId(WindbgDebuggerLocal.java:276) at jdk.hotspot.agent/sun.jvm.hotspot.debugger.windbg.amd64.WindbgAMD64Thread.getThreadID(WindbgAMD64Thread.java:88) at jdk.hotspot.agent/sun.jvm.hotspot.debugger.windbg.amd64.WindbgAMD64Thread.getContext(WindbgAMD64Thread.java:51) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.win32_amd64.Win32AMD64JavaThreadPDAccess.getCurrentFrameGuess(Win32AMD64JavaThreadPDAccess.java:103) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.JavaThread.getCurrentFrameGuess(JavaThread.java:267) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.JavaThread.getLastJavaVFrameDbg(JavaThread.java:229) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.ThreadStackTrace.dumpStack(ThreadStackTrace.java:54) at jdk.hotspot.agent/sun.jvm.hotspot.utilities.HeapHprofBinWriter.dumpStackTraces(HeapHprofBinWriter.java:718) at jdk.hotspot.agent/sun.jvm.hotspot.utilities.HeapHprofBinWriter.write(HeapHprofBinWriter.java:439) at jdk.hotspot.agent/sun.jvm.hotspot.tools.JMap.writeHeapHprofBin(JMap.java:182) at jdk.hotspot.agent/sun.jvm.hotspot.tools.JMap.run(JMap.java:97) at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:262) at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:225) at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:118) at jdk.hotspot.agent/sun.jvm.hotspot.tools.JMap.main(JMap.java:176) at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJMAP(SALauncher.java:331) at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:483)
15-06-2020