Bug ID: JDK-8202884 SA: Attach/detach might fail on Linux if debugee application create/destroy threads during attaching

JDK-8202884 : SA: Attach/detach might fail on Linux if debugee application create/destroy threads during attaching

Type: Bug
Component: hotspot
Sub-Component: svc-agent
Affected Version: 8,9,10,11,12,13

Priority: P3
Status: Resolved
Resolution: Fixed
OS: linux

Submitted: 2018-05-10
Updated: 2019-09-04
Resolved: 2018-12-13

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 11	JDK 12	JDK 13	JDK 8	Other
11.0.4-oracleFixed	12 b24Fixed	13Fixed	8u221Fixed	openjdk8u222Fixed

Related Reports

Relates :	JDK-8215042 - Umbrella bug for all SA failures in tier1
Relates :	JDK-8215247 - SA: Further robustize the attach mechanism

Description

The serviceability/sa tests fails intermittently  with
sun.jvm.hotspot.debugger.DebuggerException: Can't attach to the process:
ptrace(PTRACE_ATTACH, ..) failed for 1415: No such process
sun.jvm.hotspot.debugger.DebuggerException: sun.jvm.hotspot.debugger.DebuggerException: Can't attach to the process: ptrace(PTRACE_ATTACH, ..) failed for 1415: No such process
	at jdk.hotspot.agent/sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$LinuxDebuggerLocalWorkerThread.execute(LinuxDebuggerLocal.java:163)
	at jdk.hotspot.agent/sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.attach(LinuxDebuggerLocal.java:274)
	at jdk.hotspot.agent/sun.jvm.hotspot.HotSpotAgent.attachDebugger(HotSpotAgent.java:672)
	at jdk.hotspot.agent/sun.jvm.hotspot.HotSpotAgent.setupDebuggerLinux(HotSpotAgent.java:612)
	at jdk.hotspot.agent/sun.jvm.hotspot.HotSpotAgent.setupDebugger(HotSpotAgent.java:338)
	at jdk.hotspot.agent/sun.jvm.hotspot.HotSpotAgent.go(HotSpotAgent.java:305)
	at jdk.hotspot.agent/sun.jvm.hotspot.HotSpotAgent.attach(HotSpotAgent.java:141)
	at jdk.hotspot.agent/sun.jvm.hotspot.CLHSDB.attachDebugger(CLHSDB.java:180)
	at jdk.hotspot.agent/sun.jvm.hotspot.CLHSDB.run(CLHSDB.java:61)
	at jdk.hotspot.agent/sun.jvm.hotspot.CLHSDB.main(CLHSDB.java:40)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runCLHSDB(SALauncher.java:191)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:439)
Caused by: sun.jvm.hotspot.debugger.DebuggerException: Can't attach to the process: ptrace(PTRACE_ATTACH, ..) failed for 1415: No such process
	at jdk.hotspot.agent/sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.attach0(Native Method)
	at jdk.hotspot.agent/sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.access$100(LinuxDebuggerLocal.java:62)
	at jdk.hotspot.agent/sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$1AttachTask.doit(LinuxDebuggerLocal.java:265)
	at jdk.hotspot.agent/sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$LinuxDebuggerLocalWorkerThread.run(LinuxDebuggerLocal.java:138)
 stdout: [ Command not valid until attached to a VM
];
 stderr: [ Command not valid until attached to a VM
]
 exitValue = -1

IT happens when thread finishes after SA read threads structure for process and before it tries to this pthread.

Comments

Fix Request: This issue shall be backported to OpenJDK 11.0.4 because it was picked by Oracle for 11.0.4-oracle. The patch applied cleanly, apart from minor header fuzz. I've asked for a review: https://mail.openjdk.java.net/pipermail/jdk-updates-dev/2019-March/000755.html
11-03-2019
For attaching to the threads in a process, we first go ahead and do a ptrace attach to the main thread. Later, we use the libthread_db library (or, in the case of being within a container, iterate through the /proc/<pid>/task files) to discover the threads of the process, and add them to the threads list (within SA) for this process. Once, we have discovered all the threads and added these to the list of threads, we then invoke ptrace attach individually on all these threads to attach to these. When we deal with an application where the threads are exiting continuously, some of these threads might not exist by the time we try to ptrace attach to these threads. The proposed fix includes the following modifications to solve this. 1. Check the state of the threads in the thread_db callback routine, and skip if the state of the thread is TD_THR_UNKNOWN or TD_THR_ZOMBIE. SA does not try to ptrace attach to these threads and does not include these threads in the threads list. 2. While ptrace attaching to the thread, if ptrace(PTRACE_ATTACH, ...) fails with either ESCRH or EPERM, check the state of the thread by checking if the /proc/<pid>/status file corresponding to that thread exists and if so, reading in the 'State:' line of that file. Skip attaching to this thread and delete this thread from the SA list of threads, if the thread is dead (State: X) or is a zombie (State: Z). From the /proc man page, "Current state of the process. One of "R (running)", "S (sleeping)", "D (disk sleep)", "T (stopped)", "T (tracing stop)", "Z (zombie)", or "X (dead)"." 3. If waitpid() on the thread is a failure, again skip this thread (delete this from SA's list of threads) instead of bailing out if the thread has exited or terminated. To further robustize this scenario, the SA needs to: 1. try to attach to the thread as soon as the thread gets discovered (instead of first adding it to a thread list, and later iterating through it to invoke ptrace attach). This would reduce the window where the thread can exit. 2. Go through multiple iterations of discovering new threads (to capture the new threads which might have gotten spawned in between discovery and actual attach). This technique of multiple iterations is followed in other debuggers like gdb. These would be taken up later.
11-12-2018
Issue is due to the potential race condition with attaching to all threads in the /proc//task/ directory. Existing threads might have died off in between the time the newly discovered threads are added in add_new_thread() and when we try to ptrace attach to these threads.
10-12-2018
gdb code mentions that : "EPERM is returned if the thread's task still exists, and is marked as exited or zombie." A potential fix is to check the state of the thread if we get an EPERM while attaching to a particular thread and not try to further attach to that thread if that thread is in the process of exiting or is a zombie.
10-12-2018