JDK-4032593 : read() on a socket fails
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.net
  • Affected Version:
    2.0.1_build12,2.0_beta24,1.1,1.1.4,1.2.1 2.0.1_build12,2.0_beta24,1.1,1.1.4,1.2.1
  • Priority: P1
  • Status: Closed
  • Resolution: Not an Issue
  • OS:
    solaris_2.5.1,solaris_2.6,solaris_7,windows_95 solaris_2.5.1,solaris_2.6,solaris_7,windows_95
  • CPU: generic,x86,sparc
  • Submitted: 1997-02-14
  • Updated: 2000-04-06
  • Resolved: 2000-04-06
Related Reports
Relates :  
Relates :  
Description
[chamness 1/12/97] Update.
When running multiple threads, a thread that is invoking read()
on a socket can go to sleep.  The other side of the socket 
continues to write to this sleeping thread.  The reading thread 
eventually hangs.  It can hang on a single byte read.  So while reading
an integer, it can get a corrupt value.

Setting the timeout period with Socket.setSoTimeout() will
cause the reading thread to throw an InterruptedIOException,
and get out of the hung state.  But this is not a true
timeout.

If output streams are not buffered, windows 95 can crash.

======================================================================
Name: mc57594			Date: 02/14/97

First some background as to how we found this problem.  We are
developing a client in Java which talks to a server.  One of our
developers works from home and thus dials in over a slow line (28.8).
The java client sends a request to the server and then waits for a
reply.  The connection is over a TCP socket.  It is setup as:
	try {
	     socket = new Socket(client.ip, client.port);
	}
	catch (UnknownHostException ex) {}
	catch (IOException ex) { PtkUtil.PtkcExit(client, "Can't create socket.", 1); }
	try {
            din = new DataInputStream(socket.getInputStream());
            dout = new DataOutputStream(socket.getOutputStream());
	}
	catch (IOException ex) { PtkUtil.PtkcExit(client, "Can't create input/output stream.", 1); }
    We used to receive our replies back using code like:
	try {
	    size = _din.readShort();
	    version = _din.readInt();
	    ...
	} catch (IOException e) {
	    System.out.println("Reading header exception "+e);
	    return false;
	}
	return true;
    Now normally this works just fine.  We normally test the client and
server over an ethernet LAN and no problems.  Today we had a developer
working from home and then tried the client/server connection over the
internet and 28.8K modem.  We then encountered problems.  The server
is attempting to send a client a 1460 byte response which is having
to be retransmitted multiple times (as seen with a #snoop).
    Under JDK1.0.2, the client goes into the first
	size = _din.readShort();
and never returns.
    Under JDK1.1Beta3.3, the client goes into the first
	size = _din.readShort();
and eventually goes to the catch block with
	Reading header exception java.io.IOException: Resource temporarily unavailable
    Our suspicion with JDK1.0.2 is that the read thread is going infinite
looping considering the response from JDK1.1Beta3.3.  IE, it is catching
the exception and trying to deal with it but failing.  In either case,
there really is data "in the pipe" so readShort() should not have failed;
the socket was not closed or broken.

    I extracted the src.zip and looked in java/io/DataInputStream.java
at the readShort() function.  It looks like:
	public final short readShort() throws IOException {
	    InputStream in = this.in;
	    int ch1 = in.read();
	    int ch2 = in.read();
	    if ((ch1 | ch2) < 0)
		 throw new EOFException();
	    return (short)((ch1 << 8) + (ch2 << 0));
	}
Unfortunately under JDK1.1Beta3 we get the IOException, but we don't know
which byte (ch1 or ch2) caused the exception.  Thus we cannot easily
try again.
    Under JDK1.1Beta3.3, you have socket.getTcpNoDelay() and
socket.getSoTimeout().  They are 'false' and '0' respectively.

----------
    Instead of reading our data structure in a field at a time, I
decided to read in the entire header and then parse it apart.  Normally
I would use readFully() for this.  But we have the same problem with
readFully() and readShort(); IE, if we get an IOException we don't know
how much data has been read in so far.  Thus we decided to come up with
our own readFully() function:
    public static int readFully(DataInputStream in, byte [] b, int off, int len)
                throws EOFException
    {
        int count;
        int n = 0;
        while (n < len) {
            try {
                System.out.println("readFully of "+(len-n)+" bytes at offset "+(
off+n));
                count = in.read(b, off + n, len - n);
                System.out.println("readFully count="+count);
            }
            catch (IOException e) {
                /* Try again */
                System.out.println("readFully: exception " + e);
                continue;
            }
            if (count <= 0)
                throw new EOFException();
            n += count;
        }
        return n;
    }
The idea here (based upon our developers experience) is that when an
IOException occurs to just ignore it and try again.  In the one case
(slow link) we know there really is more data available.  But, now
I cannot detect the differenence between EOF and a bad exception.
IE, if we enter this readFully() and actually close the socket, then
we get the same IOException:
	readFully: exception java.io.IOException: Resource temporarily unavailable
as when the bytes were really temporarily unavailable.  Now what?


In general it appears there is some edge condition where the various
blocking read*() calls are failing.  In JDK1.0.2 they go infinite loop
(best guess since we never come out of the call) and in JDK1.1Beta3.3,
we get an "java.io.IOException: Resource temporarily unavailable" when
we should not.  What is the fix?
======================================================================

Comments
EVALUATION This might be a bug in the win95 socket read. We now use recv, which may help. Sent mail to submitted to ask for clarifications and tests on new releases. benjamin.renaud@Eng 1997-08-26 [chamness 11/11/97] I created a code sample that performs readInt() and readShort(). (See attachment.) win_95 was installed, and a modem was used to dial into swan. Three clients were running concurrently as well as a telnet session. After 100,000 readInt() and readShort() the maximum wait was 3.03 seconds. I was unable to get the client/server application to hang. I'm requesting information on reproducing this bug from the user. This bug may be related to 4086708, which recently had a fix. [chamness 11/13/97] Bug has been reproduced. A hang has been observed on a windows 95 client connected by modem to a windows 95 server, as well as to a Solaris server. It has only been observed when more than one client is running. In my test, each client writes 50,000 integers to the server, then reads 50,000 integers from the server. Here is the sequence of events: Client 1 Client 2 -------- -------- writeInt() | | | | V readInt() | | | *hang* x | writeInt() | | | V | readInt() | | | V finish. It appears that the writeInt() method hangs the readInt() method on another process. When running the client and server on separate Solaris machines, (connected by Ethernet) and performing the above test, Client 1 will suspend instead of hanging. When Client 2 finishes writeInt(), Client 1 will resume readInt(). [chamness 12/31/97] Reproduced under jdk1.1.6 Read calls on a Socket can hang when setSoTimeout() has not been used. With no buffering, client and server SHOULD be able to read and write single bytes. But Socket write()'s have no timeout. So when a client writes to the server, and server is not ready to read, a problem occurs. Workaround code is attached. [chamness 1/5/97] From: /JDK1.1.5/src/win32/net/socket.c The problem is that the buffer can * continue to be written to even after the thread has been suspended * while in recv, and this can cause the garbage collector to fail. There * is probably a more efficient way of creating tmpbuf than having to * reallocate it each time this function is called. This is exactly the problem! What is happening is that in the middle of a read, DataInputStream is getting an exception. It is getting an exception because the connection between the modem and the server is very slow or flaky. The exception mentioned in the description, Resource Temporarily Unavailable (EAGAIN) should not, as a rule, occur in Java, since we don't have non-blocking IO. It may happen if the system is running low on memory. This needs to be reproduced in a lab, and the exception needs to be looked at. benjamin.renaud@Eng 1998-05-05 This bug is consistantly reproducible. I have demonstrated this for benjamin serveral times. mark.chamness@Eng 1998-06-18 I have a hard time trying to reproduce the problem. If anyone out there who has a reproducible case of the bug, please let me know. yingxian.wang@eng 1999-07-06 mark.chamness@eng is back at Sun and I have asked him to see if he can still reproduce the problem for engineering. sheri.good@Eng 1999-07-07 i tried reproducing it on win95 (i already failed to reproduce it on nt). i had 5 clients on 95, and one ftp session running at the same time. the server is on ultra 5. My modem speed is 56.6k. I couldn't reproduce. it. yingxian.wang@eng 1999-07-08 mark reproduced the problem for me when running both the server and two clients on the same win95 machine. i played with it for a while, and then realized that the workaround that mark provided acutally has something to do with thread scheduling. if we look at the servant code (provided in the attachment field), it may not block. to quote Concurrent Programming in Java, "threads with equal priority are not necessarily preempted in favor of each other". in other words, is it possible that the servant thread doesn't preempt for the client thread, and as a result, the client thread doesn't get a chance to run, and it appears that it's hanging, but actually, it's not? based on this suspicion, i put one line of code in the servant code, within the for loop, i added sleep(100), for(int i=0; i< 100000; i++) { dos.writeInt(i); dos.writeShort(i); //dos.writeShort((short)i); ===> sleep(100); } indeed the hanging disappeared. i further suspected that native code will suffer from the same problem. so i wrote a server and a client in c++ on win95, and has server do the same thing as in java code. sure enough, i got confirmation. without the Sleep statement in the for loop, the client will appear hanging, with the statement, everything is fine. i attached the native code test in the attachements field as well for reproducing the my claim. yingxian.wang@eng 1999-07-29 Bottom line is : - We can't reproduce the bug with the client only running on the Windows box. - We can get the same kind of problem when running both the client and the server on the windows machine, but then it points out to a limitation of the OS and bad programming practices (related to multithreading). CTE agrees that this is not a bug : the 3 times they had to work on an escalation of this bug, it turned out to be a error in the customer's program. Therefore bug is closed as "not a bug".
11-06-2004

WORK AROUND [chamness 12/31/97] Problem is observed when numerous clients are connected to a server, and overloading input/output. It also appears when a client is connected over a slow modem. This data flow problem can be handled. 1. Add yield() method to threads that manage each Socket connection. Without this, one of them can block the others. Should be implemented before read() and write() methods. From: Concurrent Programming in Java, page 24 "... threads with equal priority are not necessarily preempted in favor of each other." "The Thread.yield method relinquishes control, which may enable one or more threads to be run." 2. Socket.setSoTimeout() must be set. The Socket will ONLY timeout on a socket read. The socket timeout must be caught, and another InputStream.read() call can take place. Coding caveat: This is not the proper use of timeouts. 3. Buffering output from both client and server speeds up data transfer and helps windows 95 from crashing. Buffering input has been observed to cause problems when the socket read hangs. Bogus data results. Behavior: With clients writing to server: Clients write to buffered output streams, filling them up. Sometimes a client write takes several seconds, because the server is catching up on reading the data. The server reads from each of these streams as fast as it can. With clients reading to server: Server cannot (always) write to multiple clients as fast as they can read data. So Socket timeout exception will occur on client side read(). A subsequent read() can be called. Repeated timeout exceptions have been observed, but client code eventually continues reading data. Example Server Code: private void dataTransfer() throws IOException { int myInt=0; //BETTER BEHAVIOR WHEN TIMEOUT IS SET sock.setSoTimeout(1000); //FIRST READ //BUFFERED INPUT SOLVES SOCKET READ HANG InputStream in = sock.getInputStream(); BufferedInputStream bis = new BufferedInputStream(in); DataInputStream dis = new DataInputStream(bis); while(dis.available() == 0) { try { Thread.sleep(100); } catch(InterruptedException ie) {} System.out.println("WAITING FOR DATA"); } for(int i=0; i< servant.kCYCLES; i++) { yield(); try { myInt = dis.readInt(); System.out.println(id +" READ:"+myInt); } catch(InterruptedIOException iioe) { //CATCH TIMEOUT System.out.println("READ TIMEOUT"); System.out.println(iioe); i--; } } OutputStream out = sock.getOutputStream(); BufferedOutputStream bos = new BufferedOutputStream(out); DataOutputStream dos = new DataOutputStream(bos); //NOW WRITE for(int i=0; i< servant.kCYCLES; i++) { yield(); dos.writeInt(i); System.out.println(id + " WROTE:"+i); } dos.flush(); dos.close(); dis.close(); sock.close(); }
11-06-2004

PUBLIC COMMENTS I have a hard time trying to reproduce the problem. If anyone out there who has a reproducible case of the bug, please let me know. yingxian.wang@eng 1999-07-06
06-07-1999