JDK-8227594 : sadebugd/DebugdConnectTest.java fails due to "java.rmi.NotBoundException: SARemoteDebugger"
  • Type: Bug
  • Component: hotspot
  • Sub-Component: svc-agent
  • Affected Version: 14
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2019-07-11
  • Updated: 2020-06-01
  • Resolved: 2019-07-13
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 13 JDK 14
11.0.5Fixed 13.0.2Fixed 14 b06Fixed
Related Reports
Relates :  
Description
The following test is failing in the JDK14 CI:

serviceability/sa/sadebugd/DebugdConnectTest.java

Here's a snippet from the log file:

----------System.err:(109/7475)----------
Command line: ['/scratch/opt/mach5/mesos/work_dir/jib-master/install/jdk-14+6-139/solaris-sparcv9-debug.jdk/jdk-14/fastdebug/bin/java' '-XX:MaxRAMPercentage=12' '-cp' '/scratch/opt/mach5/mesos/work_dir/9450d715-bc46-4f58-8750-2640c04740f9/testoutput/test-support/jtreg_open_test_hotspot_jtreg_tier1_serviceability/classes/0/serviceability/sa/sadebugd/DebugdConnectTest.d:/scratch/opt/mach5/mesos/work_dir/9450d715-bc46-4f58-8750-2640c04740f9/testoutput/test-support/jtreg_open_test_hotspot_jtreg_tier1_serviceability/classes/0/test/lib' 'jdk.test.lib.apps.LingeredApp' 'a0e70dc8-0360-469c-8428-953e47059e2d.lck' ]







Error attaching to remote server: java.rmi.NotBoundException: SARemoteDebugger
sun.jvm.hotspot.debugger.DebuggerException: java.rmi.NotBoundException: SARemoteDebugger
	at jdk.hotspot.agent/sun.jvm.hotspot.RMIHelper.lookup(RMIHelper.java:115)
	at jdk.hotspot.agent/sun.jvm.hotspot.HotSpotAgent.connectRemoteDebugger(HotSpotAgent.java:519)
	at jdk.hotspot.agent/sun.jvm.hotspot.HotSpotAgent.setupDebugger(HotSpotAgent.java:375)
	at jdk.hotspot.agent/sun.jvm.hotspot.HotSpotAgent.go(HotSpotAgent.java:305)
	at jdk.hotspot.agent/sun.jvm.hotspot.HotSpotAgent.attach(HotSpotAgent.java:184)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:196)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.JSnap.main(JSnap.java:87)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJSNAP(SALauncher.java:450)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:548)
Caused by: java.rmi.NotBoundException: SARemoteDebugger
	at java.rmi/sun.rmi.registry.RegistryImpl.lookup(RegistryImpl.java:234)
	at java.rmi/sun.rmi.registry.RegistryImpl_Skel.dispatch(RegistryImpl_Skel.java:131)
	at java.rmi/sun.rmi.server.UnicastServerRef.oldDispatch(UnicastServerRef.java:468)
	at java.rmi/sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:298)
	at java.rmi/sun.rmi.transport.Transport$1.run(Transport.java:200)
	at java.rmi/sun.rmi.transport.Transport$1.run(Transport.java:197)
	at java.base/java.security.AccessController.doPrivileged(AccessController.java:690)
	at java.rmi/sun.rmi.transport.Transport.serviceCall(Transport.java:196)
	at java.rmi/sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:562)
	at java.rmi/sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:796)
	at java.rmi/sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:677)
	at java.base/java.security.AccessController.doPrivileged(AccessController.java:391)
	at java.rmi/sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:676)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:830)
	at java.rmi/sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:283)
	at java.rmi/sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:260)
	at java.rmi/sun.rmi.server.UnicastRef.invoke(UnicastRef.java:375)
	at java.rmi/sun.rmi.registry.RegistryImpl_Stub.lookup(RegistryImpl_Stub.java:119)
	at java.rmi/java.rmi.Naming.lookup(Naming.java:101)
	at jdk.hotspot.agent/sun.jvm.hotspot.RMIHelper.lookup(RMIHelper.java:113)
	... 9 more
Comments
Fix Request for 13u: This is testbug. I want to backport this for JDK-8209790 for 13u.
28-08-2019

Fix Request for 11u: This is testbug. I want to backport this for JDK-8209790.
23-07-2019

URL: https://hg.openjdk.java.net/jdk/jdk/rev/901ab17af3ce User: ysuenaga Date: 2019-07-13 11:55:50 +0000
13-07-2019

I don't think it is in general an issue with any test using RMI. The sadebugd tests triggers code in sun.jvm.hotspot.RMIHelper that binds using the default debugd serverid name "SARemoteDebugger". So if two tests try doing this at the same time, you run into issues (mostly because the first test ends up shutting down the debugd server as it exits). As Yasumasa pointed out, only the tests in the sadebugd directory use debugd, so that should be the only place where exlusiveAccess.dirs is needed. It's also only an issue because the the test uses the default serverid. It is possible to specify a serverid with the --serverid option, but the test purposely tries without. If you look at the test you see: System.out.println("debugd connection test with server id:"); runTests("test", app.getPid()); System.out.println("debugd connection test without server id:"); runTests(null, app.getPid()); The first arg is the debugd serverid. The first run with "test" I believe always passes, and the problem only happens when null is use. This triggers the use of the default debugd serverid of "SARemoteDebugger", which is also what the other test in the directory, SADebugDTest, ends up using.
12-07-2019

Ummm... exclusiveAccess.dirs only works for the directory that contains the test. What happens if there are other tests running in parallel that are using RMI?
12-07-2019

Ok. You just need "exclusiveAccess.dirs=." in TEST.properties in the serviceability/sa/sadebugd directory.
12-07-2019

I don't think to set `serverNamePrefix` is the solution. It affects URI of RMI service, do not affect TCP server. RMIHelper::rebind() calls LocateRegistry::createRegistry() if Naming::rebind() failed. createRegistry() openes server socket for RMI in its process. Thus when multiple DebugdConnectTest are called concurrently, they attempt to use the RMI registry which is created by first process. I think it is the only solution to add `exclusiveAccess.dirs=serviceability/sa/sadebugd` to TEST.properties . AFAIK the tests which use debugd are available in serviceability/sa/debugd. In particular they are DebugdConnectTest and SADebugDTest. (DebugdUtils is utility class for handling debugd) SADebugDTest tests to start debugd server. It does not test to be connected from the client. So we could not see this error until now.
12-07-2019

Yeah, I was kind of headed down that same path and found RMIHelper, but it wasn't clear to me how it all works. Maybe you can set sun.jvm.hotspot.rmi.serverNamePrefix to something unique. I think it would need to be done both when running debugd and the LingeredApp (they should be the same for the test you are running though). You can add a TEST.properties file with "exclusiveAccess.dirs=" to the directory. That will cause all tests in the directory to run one at a time. I wonder which other test it is conflicting with. It seems to not be in the sadebugd directory since I was never able to reproduce the problem by just running the two tests in that directory. So that means a test in the serviceabitly/sa directory is also vulnerable. Aren't there other debugd tests? Is there a reason they don't run into this issue?
12-07-2019

I think the cause of this problem is that debugd uses RMI. http://hg.openjdk.java.net/jdk/jdk/file/a35a210e6c0f/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/RMIHelper.java#l61 debugd attempt to bind the service to existing RMI registry. If DebugdConnectTest starts concurrently, some tests would bind to another one which creates RMI registry at first. I think DebugdConnectTest should not be run concurrently. Can we control it? (Is it the `exclusiveAccess.dirs=` ?)
11-07-2019

It doesn't fail every time when run concurrent with the other sa tests, but seems to fail most of the time. The following reproduces it most of the time: make run-test CONF=linux-x64-slowdebug TEST=open/test/hotspot/jtreg/serviceability/sa/ JTREG="JOBS=8;" And the following seems to always pass: make run-test CONF=linux-x64-slowdebug TEST=open/test/hotspot/jtreg/serviceability/sa/ JTREG="JOBS=1;" Also, adding TEST.properties with "exclusiveAccess.dirs=" seems to make it pass. So this is a potential fix, but first I'd like to know why there are concurrency issues with this test, and once understood, hopefully they can be addressed directly rather than just avoiding concurrency. Yasumasa, can you look into the concurrency problem?
11-07-2019

Fails when run in parallel with all the serviceability/sa tests. Probably needs to be run with no concurrency, although I'd like to know why. It connects using the pid of the spawned LingeredApp, so it shouldn't be an issue with connecting to the wrong LingeredApp. Maybe it's connecting with the wrong debugd.
11-07-2019

The test runs jstack, jmap, jinfo, and jsnap in that order. With the first failure it appears to have issues during jsnap, but with the second failure it didn't even get through jstack. I've run locally on my machine with no issues. Trying now with a mach5 job.
11-07-2019

Yasumasa, can you look at this?
11-07-2019

This is a new test added by JDK-8209790.
11-07-2019

A failure on Linux has slight different log output: ----------System.err:(107/7511)---------- java version "14-ea" 2020-03-17 Java(TM) SE Runtime Environment (fastdebug build 14-ea+6-137) Java HotSpot(TM) 64-Bit Server VM (fastdebug build 14-ea+6-137, mixed mode) Command line: ['/scratch/opt/mach5/mesos/work_dir/jib-master/install/jdk-14+6-137/linux-x64-debug.jdk/jdk-14/fastdebug/bin/java' '-XX:MaxRAMPercentage=6' '-cp' '/scratch/opt/mach5/mesos/work_dir/slaves/dd0bab7c-a530-4905-bb6b-daa7153eb2c1-S128/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/f1639f47-4389-4c5c-88a1-4e1428edbb9d/runs/c0e93cf8-75fa-4638-8002-a34f8cd776c2/testoutput/test-support/jtreg_open_test_hotspot_jtreg_serviceability_sa/classes/1/serviceability/sa/sadebugd/DebugdConnectTest.d:/scratch/opt/mach5/mesos/work_dir/slaves/dd0bab7c-a530-4905-bb6b-daa7153eb2c1-S128/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/f1639f47-4389-4c5c-88a1-4e1428edbb9d/runs/c0e93cf8-75fa-4638-8002-a34f8cd776c2/testoutput/test-support/jtreg_open_test_hotspot_jtreg_serviceability_sa/classes/1/test/lib' 'jdk.test.lib.apps.LingeredApp' '7ce0ba18-83d4-47a8-8f3b-9975a625a628.lck' ] Error attaching to remote server: java.rmi.ConnectException: Connection refused to host: localhost; nested exception is: java.net.ConnectException: Connection refused sun.jvm.hotspot.debugger.DebuggerException: java.rmi.ConnectException: Connection refused to host: localhost; nested exception is: java.net.ConnectException: Connection refused at jdk.hotspot.agent/sun.jvm.hotspot.RMIHelper.lookup(RMIHelper.java:115) at jdk.hotspot.agent/sun.jvm.hotspot.HotSpotAgent.connectRemoteDebugger(HotSpotAgent.java:519) at jdk.hotspot.agent/sun.jvm.hotspot.HotSpotAgent.setupDebugger(HotSpotAgent.java:375) at jdk.hotspot.agent/sun.jvm.hotspot.HotSpotAgent.go(HotSpotAgent.java:305) at jdk.hotspot.agent/sun.jvm.hotspot.HotSpotAgent.attach(HotSpotAgent.java:184) at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:196) at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:118) at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.runWithArgs(JStack.java:90) at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJSTACK(SALauncher.java:297) at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:533) Caused by: java.rmi.ConnectException: Connection refused to host: localhost; nested exception is: java.net.ConnectException: Connection refused at java.rmi/sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619) at java.rmi/sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:209) at java.rmi/sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:196) at java.rmi/sun.rmi.server.UnicastRef.newCall(UnicastRef.java:338) at java.rmi/sun.rmi.registry.RegistryImpl_Stub.lookup(RegistryImpl_Stub.java:112) at java.rmi/java.rmi.Naming.lookup(Naming.java:101) at jdk.hotspot.agent/sun.jvm.hotspot.RMIHelper.lookup(RMIHelper.java:113) ... 9 more Caused by: java.net.ConnectException: Connection refused at java.base/sun.nio.ch.Net.connect0(Native Method) at java.base/sun.nio.ch.Net.connect(Net.java:493) at java.base/sun.nio.ch.Net.connect(Net.java:482) at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:588) at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:339) at java.base/java.net.Socket.connect(Socket.java:585) at java.base/java.net.Socket.connect(Socket.java:534) at java.base/java.net.Socket.<init>(Socket.java:457) at java.base/java.net.Socket.<init>(Socket.java:231) at java.rmi/sun.rmi.transport.tcp.TCPDirectSocketFactory.createSocket(TCPDirectSocketFactory.java:40) at java.rmi/sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:613) ... 15 more
11-07-2019