JDK-4294599 : socket creation fails with EADDRNOTAVAIL on Solaris 7
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.net
  • Affected Version: 1.3.0
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: solaris_7
  • CPU: sparc
  • Submitted: 1999-11-25
  • Updated: 2000-01-04
  • Resolved: 1999-12-22
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availabitlity Release.

To download the current JDK release, click here.
Other
1.3.0 kestrelFixed
Related Reports
Relates :  
Description
daniel.daugherty@Eng 1999-11-24

During Standard Look testing for Kestrel FCS-O, the HttpEcho server
was unable to reliably create "call back" sockets to the HTTP client
tests. Here is the server log output:

java version "1.3.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.0-O)
Java HotSpot(TM) Client VM (build 1.3-O, interpreted mode)
java full version "1.3.0-O"
java.lang.InternalError: java.net.SocketException: errno: 126, error: Cannot ass
ign requested address for fd: 19
        at HttpEcho.doClient(HttpEcho.java:320)
        at HttpEcho.run(HttpEcho.java:134)
        at java.lang.Thread.run(Thread.java:488)
java.lang.InternalError: java.net.SocketException: errno: 126, error: Cannot ass
ign requested address for fd: 14
        at HttpEcho.doClient(HttpEcho.java:320)
        at HttpEcho.run(HttpEcho.java:134)
        at java.lang.Thread.run(Thread.java:488)
java.lang.InternalError: java.net.SocketException: errno: 126, error: Cannot ass
ign requested address for fd: 16
        at HttpEcho.doClient(HttpEcho.java:320)
        at HttpEcho.run(HttpEcho.java:134)
        at java.lang.Thread.run(Thread.java:488)
java.lang.InternalError: java.net.SocketException: errno: 126, error: Cannot ass
ign requested address for fd: 16
        at HttpEcho.doClient(HttpEcho.java:320)
        at HttpEcho.run(HttpEcho.java:134)
        at java.lang.Thread.run(Thread.java:488)

In the above output, errno 126 is EADDRNOTAVAIL. This bug occurred on
an Ultra 60 running Solaris 7. It was *not* reproducible on a Pentium
166 running WinNT. More analysis of this bug is planned for Kestrel
FCS-P.

daniel.daugherty@Eng 1999-11-29

I also meant to include a more detailed stacktrace from the actual
exception that was tossed:

java version "1.3.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.0-O)
Java HotSpot(TM) Client VM (build 1.3-O, interpreted mode)
java full version "1.3.0-O"
ERROR: all we wanted for Christmas was a sock...et
java.net.SocketException: errno: 126, error: Cannot assign requested address for fd: 19
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:316)
	at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:129)
	at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:116)
	at java.net.Socket.<init>(Socket.java:277)
	at java.net.Socket.<init>(Socket.java:104)
	at HttpEcho.doClient(HttpEcho.java:318)
	at HttpEcho.run(HttpEcho.java:134)
	at java.lang.Thread.run(Thread.java:488)
java.lang.InternalError: java.net.SocketException: errno: 126, error: Cannot assign requested address for fd: 19
	at HttpEcho.doClient(HttpEcho.java:322)
	at HttpEcho.run(HttpEcho.java:134)
	at java.lang.Thread.run(Thread.java:488)

bradford.wetmore@eng 1999-12-03

Well, some more information.  I have reproduced the problem on both a
32 bit and a 64 bit Solaris 7. 

I was not able to reproduce it on a Ultra 10, but was able to reliably
reproduce on 2-60's, and a 2-way E250.  I haven't gone down the path of
checking out kernel patches.

bongos:		Ultra 60 32 bit  FAIL
javinator:	Ultra 60 64 bit  FAIL 
luster:		E250x2 64 bit    FAIL
glossy:		Ultra 10 64 bit  PASSED

In case it wasn't blindinly obvious, all clients were Solaris.  No
NT machines were harmed in the making of this exceptional condition!

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: generic FIXED IN: kestrel INTEGRATED IN: kestrel VERIFIED IN: kestrel
2004-06-14

EVALUATION bradford.wetmore@eng 1999-12-03 It was curious to me that I could only reproduce this on the higher speed machines, so I began to suspect a race condition in the test case. Sure enough. Time Client Server ==== ====== ====== 1 open socket 2 generate http request 3 receive http request 4 process http echo on the host/port specified in the request 5 open reply socket, no receiver ready 6 generate exception 7 finish setup of the receive server socket, accept() 8 times out waiting for the connection By adding some time before calling the socket create, everything works just fine. else if (echoAddr != null && echoPort != -1) { try { + Thread.sleep(1000); echo = new Socket(echoAddr, echoPort); } catch (Exception e) { throw new RuntimeException(e.toString()); } I'm closing this as a bug against java/classes_net, but you may want to reopen it as a bug against the test suite. I don't know what the cat/subcat would be for that. daniel.daugherty@Eng 1999-12-08 I'm having a problem verifying your theory using Solaris 2.6 and Solaris 7. Please let me explain. I modified HttpRequestDriver.java to delay before the accept call: ------ HttpRequestDriver.java ------- *** /tmp/dSsKgC_ Wed Dec 8 10:51:21 1999 --- HttpRequestDriver.java Wed Dec 8 10:43:17 1999 *************** *** 172,177 **** --- 172,182 ---- if (noEcho) return (true); // not expecting a client request echo + try { + Thread.sleep(5000); + logOutput("delay done"); + } catch (InterruptedException e) { + } errorMesg = "Cannot open client request echo socket."; echoSocket = echoSS.accept(); logDebug("Client request echo socket connected."); With a five second delay, this should have made the bug reproducible on the slowest of machines. I also modified the HttpEcho.java server to print a message before and after call back socket creation: ------- HttpEcho.java ------- *** /tmp/d00TA3R Wed Dec 8 10:53:10 1999 --- HttpEcho.java Wed Dec 8 10:46:13 1999 *************** *** 315,321 **** --- 315,323 ---- // else if (echoAddr != null && echoPort != -1) { try { + logOutput("start creating echo back socket"); echo = new Socket(echoAddr, echoPort); + logOutput("done creating echo back socket"); } catch (Exception e) { throw new RuntimeException(e.toString()); } When I run the server with the HttpVersionTest client, I see the "start creating echo back socket" and "done creating echo back socket" messages as soon as the client sends the HTTP request. After five seconds, I see the "delay done" message on the client. So the server was able to create the call back client socket before the accept() method is called. Why is this? Probably because the call back server socket is created in the HttpRequestDriver constructor on behalf of the HTTP client test program. This means that the call back server socket is available pretty much from program start-up and definitely before any HTTP requests are sent. Since delaying the accept() method call does not make the problem more reproducible, I'm beginning to doubt that we really have an application protocol problem. For Solaris 2.6 as server, I tried one and two client processes. For Solaris 7 as server, I did the equivalent of a Standard Look, three clients with two processes per client. No luck reproducing the bug. I'm returning the bug to Brad. bradford.wetmore@eng 1999-12-09 Since Dan is sick today, I was 1/2 tempted to give it right back so I could go off on vacation with a clean plate! :) Just another data point. In putting in some println around the main do in doClient, the bug would not reproduce. It seems to slow down the server enough such that whatever needs to happen does. Still looking.... bradford.wetmore@eng 1999-12-09 Unfortunately, I have to leave before I can finish this off. I'll be back on Monday. I was able to duplicate the condition Dan described. Rats. We are looking at a very tricky race condition here. To save some time in trying to startup and test the condition, I made various modifications to the run_http_client script. $java_cmd HttpClient -l 1m $comm_echo_args -o "$LOG_DIR/http_out_${test_log}_job$job_num.$$" $test_options $java_cmd HttpConnTokenDefTest -l 1m $test_options "$server" $java_cmd HttpContinueRecvTest -l 1m $test_options "$server" $java_cmd HttpHostHeaderTest -l 1m $test_options "$server" In the normal test case, with all tests running, the test normally died in HttpHostHeaderTest. So I tried to isolate and only run that test, but then that test succeeded. So I ran each of the tests by individually and they all ran. So I began to suspect one influenced the other, and ran ContinueRecvTest and HttpHostHeaderTest together. I got the exception, but when I put in an echo before and after each test, the exception went away. echo "just before" $java_cmd HttpContinueRecvTest -l 1m $test_options "$server" echo "just after" echo "just before" $java_cmd HttpHostHeaderTest -l 1m $test_options "$server" echo "just after" Weird... I also ran HttpConnTokenDefTest and HttpHostHeaderTest together, and got the exception in HttpConnTokenDefTest! I don't remember if there were prints around it, but anyway, it didn't matter which test case the exception came from. I saw it happen in two different test cases. One last thing I did was to snoop the wire, but the results were inconclusive. I'll have to do more investigation when I get back. bradford.wetmore@eng 1999-12-16 What a project! Whoo Hoo! It took forever, but here's the answer. We're bumping into a limitation of TCP here. TCP is working as designed. I have distilled the java code down into it's corresponding C code, and they are included as attachments to this bug. Please run them at your leisure. You'll need to replace the IP address with your test machines, but eventually, you'll get a EADDRNOTAVAIL. The terminology could get a little confusing. Definitions first: server: The machine with the HttpEcho server running client: the machine(s) with the HttpRequestDriver client running The server turns around and becomes a client by connecting back to the original client. But for ease of terminology below, I continue to call the server a server, even though it's operating in the client mode. Basically, as Dan has coded this test, the server is basically taking connections on a well-known port, then turns around and opens a connection back to the client on a specified port, and echos some information back to the client. At the end of the server's transmition, the server process does a close() on the socket, which sends a TCP-FIN packet to the client, and sends the server's TCP connection state into the FIN_WAIT_1 state. The client eventually ACKs the TCP-FIN, which puts the server into the FIN_WAIT_2 state. The client eventually closes its port, which send a TCP-FIN to the server, which then moves the server into the TCP TIME_WAIT state. The socket connection information is kept in the TIME_WAIT state for 2*MSL (Maximum Segment Lifetime), which on Solaris is 4 minutes total by default. (See usr/src/uts/common/inet/tcp.c:tcp_param_arr[0]) While the port combination is in the TIME_WAIT state, you can not reuse the 4-tuple. Basically, Dan's test turned around the requests so fast that it used up all possible socket port combinations that weren't in the TIME_WAIT state. This is why we didn't see the problem on certain slower server machines. The tests ran such that the port combos were being recycled. We were right on the edge with the faster machines, because simple prints inserted into the tests were causing the tests to succeed. (My server, luster, is a 2GB memory, 2-way CPU (300 Mhz).) For more information, please see Chap 18, section 6 of: TCP/IP Illustrated, Volume 1 W. Richard Steven Or read the RFC 793, but I wouldn't recommend it! This book is much easier to read! Given this, we could recode his tests to cause the client to break the echo socket connection instead of the server. But that's not really what the tests are testing. It might make sense to put a delay interval in when tests start failing. Brad daniel.daugherty@Eng 1999-12-22 The solution for the HttpEcho test server is to add a delay and retry when the application is unable to create the call-back client socket. This will allow the system to recover and the test to complete.
2004-06-11

PUBLIC COMMENTS This is a bug in the test suite.
2004-06-10