JDK-8235211 : serviceability/attach/RemovingUnixDomainSocketTest.java fails with AttachNotSupportedException: Unable to open socket file
  • Type: Bug
  • Component: core-svc
  • Sub-Component: tools
  • Affected Version: 14,15
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2019-12-02
  • Updated: 2023-09-21
  • Resolved: 2020-05-13
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 15 Other
11.0.13Fixed 15 b24Fixed openjdk8u342Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8238710 :  
Description
Snippets from the log file:

#section:main
----------messages:(8/247)----------
command: main RemovingUnixDomainSocketTest
reason: User specified action: run main RemovingUnixDomainSocketTest 
Mode: agentvm
Agent id: 2
Timeout refired 480 times
Timeout information:
--- Timeout information end.
elapsed time (seconds): 845.544

<snip>

----------System.err:(16/2048)----------
Command line: ['/scratch/mesos/jib-master/install/jdk-14+26-1198/macosx-x64-debug.jdk/jdk-14/fastdebug/bin/java' '-XX:MaxRAMPercentage=12' '-cp' '/scratch/mesos/slaves/ec52e70b-7270-4911-a8ea-ec00a8b5dfb8-S31089/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/4b543924-8b3b-4481-9aa0-e804c33a33f9/runs/411cea41-268a-4d88-8fea-1805f4e2d1c4/testoutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_serviceability/classes/1/serviceability/attach/RemovingUnixDomainSocketTest.d:/scratch/mesos/slaves/ec52e70b-7270-4911-a8ea-ec00a8b5dfb8-S31089/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/4b543924-8b3b-4481-9aa0-e804c33a33f9/runs/411cea41-268a-4d88-8fea-1805f4e2d1c4/testoutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_serviceability/classes/1/test/lib' 'jdk.test.lib.apps.LingeredApp' '2c99beff-95bf-4689-bbd4-3d73df6efe58.lck' ]

com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file /var/folders/q1/xcy7ggg12nsfl896_5c7wmhr0000mv/T/.java_pid43354: target process 43354 doesn't respond within 10500ms or HotSpot VM not loaded
	at jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:99)
	at jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
	at jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
	at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113)
	at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)

 stdout: [43354:
];
 stderr: [com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file /var/folders/q1/xcy7ggg12nsfl896_5c7wmhr0000mv/T/.java_pid43354: target process 43354 doesn't respond within 10500ms or HotSpot VM not loaded
	at jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:99)
	at jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
	at jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
	at jdk.jcmd/sun.tools.jcmd.JCmd.executeCom
result: Error. Agent error: java.lang.Exception: Agent 2 timed out with a timeout of 480 seconds; check console log for any additional details
Comments
Fix Request(8u): This backport eliminates the deadlock described https://github.com/openjdk/jdk8u-dev/pull/32. Follow-up fix for JDK-8225690 which is already backported. This patch can't apply cleanly to jdk8u. We replace os::naked_yield with os::yield(). Review: https://github.com/openjdk/jdk8u-dev/pull/32
05-04-2022

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk8u-dev/pull/32 Date: 2022-04-05 00:55:22 +0000
05-04-2022

Fix Request (11u): Follow-up fix for JDK-8225690 which is already backported. Review: https://github.com/openjdk/jdk11u-dev/pull/153 (includes JDK-8244973)
22-07-2021

URL: https://hg.openjdk.java.net/jdk/jdk/rev/44c24e779d51 User: amenkov Date: 2020-05-13 22:28:18 +0000
13-05-2020

Next chunk of debugging info: AttachListener state is not set to AL_NOT_INITIALIZED because attachListener thread hangs in: - AttachListener::dequeue() -> ThreadBlockInVM dtor -> ThreadStateTransition::trans(_thread_blocked, _thread_in_vm) -> ThreadStateTransition::transition(...) -> SafepointMechanism::block_if_requested(...) -> SafepointMechanism::block_if_requested_slow(...) -> SafepointSynchronize::block_or_handshake(thread) -> SafepointSynchronize::block(thread) When the test passes void SafepointMechanism::block_if_requested() does not call block_if_requested_slow: void SafepointMechanism::block_if_requested(JavaThread *thread) { if (!local_poll_armed(thread)) { return; } block_if_requested_slow(thread); I.e. local_poll_armed() returns false
06-05-2020

check_socket_file logic is executed on signal handler thread. The question is if this may cause lock in thread->check_and_wait_while_suspended (in AttachListener thread)
30-04-2020

Tried to implement fix by: - make BsdAttachListener::_listener (and LinuxAttachListener::_listener) volatile; - move socket cleanup to AttachListener thread: introduce new AttachListenerState (AL_SHUTTING_DOWN); if attachListener needs to be restarted, check_socket_file call shutdown() for the socket (to terminate accept()), but doesn't close it; Socket is closed by attachListener thread when accept() fails. This works fine on Linux, but on BSD shutdown() returns error for listening socket (ENOTCONN - socket is not connected). Further research/debugging shown: as already described LingeredApp hangs in check_socket_file function waiting while AttachListener thread sets AL_NOT_INITIALIZED state: while (AttachListener::transit_state(AL_INITIALIZING, AL_NOT_INITIALIZED) != AL_NOT_INITIALIZED) { ::naked_yield(); } For successful runs AttachListener thread already did it. For failed runs check_socket_file reach the code while AttachListener thread is in AttachListener::dequeue call The thread exits BsdAttachListener::dequeue (with NULL result), but is locked at // were we externally suspended while we were waiting? thread->check_and_wait_while_suspended();
30-04-2020

aix implementation behave a bit differently - it sets a special flag (_shutdown) and does not close listening socket, only call shutdown() (but this way socket resources are never freed)
24-04-2020

Debugging shown that the failures are caused by LingeredApp hang when jcmd tries to attach to it 2nd time. It hangs in check_socket_file function (attachListener_bsd.cpp, attachListener_linux.cpp) waiting termination of the previous attach listener: while (AttachListener::transit_state(AL_INITIALIZING, AL_NOT_INITIALIZED) != AL_NOT_INITIALIZED) { ::naked_yield(); } attach_listener_thread_entry function (attachListener.cpp) is supposed to set AL_NOT_INITIALIZED state. Unfortunately I was not able to locate exact place where attach_listener_thread_entry hangs. Most likely in call of AttachListener::dequeue() (it sets AL_NOT_INITIALIZED state if dequeue() returns NULL). AttachListener::dequeue (attachListener_bsd.cpp, attachListener_linux.cpp) waits for new connection by accept(listener(), ...) check_socket_file calls listener_cleanup() which does ::shutdown(s, SHUT_RDWR); ::close(s); so accept call is expected return error. close() funation man page contains the following note: It is probably unwise to close file descriptors while they may be in use by system calls in other threads in the same process. Since a file descriptor may be reused, there are some obscure race conditions that may cause unintended side effects.
23-04-2020

The test was introduced by JDK-8225690 (to verify implemented AttachListener::check_socket_file logic). I don't see any significant difference between osx and linux implementation (this bug is macosx only) I guess LingeredApp is killed by JTreg after timeout (so we get SIGABRT). I'm going to update the test to localize the issue (as it's still unclear what causes timeout - jcmd should exit after it cannot connect to target VM)
16-04-2020

Also see some analysis done in this issue: 8241695: JFR TestCrossProcessStreaming.java child process exited with SIGQUIT (131) In short, the target process was killed via SIGQUIT. No hs_err file, but core file was available. I have analyzed thread/stack traces from the core file, and David Holmes provided some comments. Looks like SIGQUIT is sent to the target process, but target process does not adequately reacts to it (even though it looks like the signal handlers are already installed according to David). Rather intermitted (~1/200 runs). In case of 8241695, I implemented a work-around in the test, since the goal of the test is not to test attach (it uses attach to get jfr properties), and I lack expertise in the SVC area.
13-04-2020

Also see JDK-8240622 for some unexplained attach failures on OSX.
11-03-2020

Only about half of the failures include "LingeredApp terminated with non-zero exit code 134".
11-03-2020

I don't think core files at the timeout time are useful. Test runs LingeredApp, attaches to it, removes .java_pid file and then tries to attach again. As far as I understand LingeredApp terminates early and this causes timeout during 2nd attach. The main question is what causes LingeredApp to terminate.
14-02-2020

The latest failure shows that LingeredApp does not produce any output. I supposed that exit code 134 means the process exits because of SIGABRT signal (exit status is 128 plus the signal number and signal number 6 is SIGABRT), but I'd expect to see crash dump in stderr
14-02-2020

Looks like some failures are caused by termination of LingeredApp: "LingeredApp terminated with non-zero exit code 134" Unfortunately LingeredApp output is not logged in case non-zero exit
07-02-2020