JDK-6285901 : (so) Data corruption with asynchronous close (Solaris/Linux)
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio
  • Affected Version: 1.4.2
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux
  • CPU: x86
  • Submitted: 2005-06-15
  • Updated: 2011-02-16
  • Resolved: 2006-05-30
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6
6 b86Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
FULL PRODUCT VERSION :
Java version "1.4.2_08"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_08-b03)
Java HotSpot(TM) Client VM (build 1.4.2_08-b03, mixed mode)


ADDITIONAL OS VERSION INFORMATION :
Linux aquarius 2.4.18-3custom #7 Sun Aug 10 14:53:17 EST 2003 i586 unknown

A DESCRIPTION OF THE PROBLEM :
Asynchronous closes can result in data that should be written to one channel being written to a completely unrelated stream if another stream is opened from a concurrent thread.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
nice -19 java -cp . ClosingTest S T C

The use of nice makes the error appear more often. It is not strictly necessary.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The output should not include lines of this form

Oops - read a character: A

because the peer for the socket on which that data is read never writes any data to it. It merely connects, sleeps for a bit, and then closes the socket.
ACTUAL -
The output includes lines of the form

Oops - read a character: A


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.io.*;
import java.nio.*;
import java.nio.channels.*;
import java.net.*;

public class ClosingTest {
	static class SensorServer extends ThreadEx {
		public void runEx() throws Exception {
			ServerSocket server;
			server = new ServerSocket(3010);
			for(;;) {
				Socket s = null;
				try {
					s = server.accept();
					int c = s.getInputStream().read();
					if(c != -1) {
						// No data is ever written to the peer's socket!
						System.out.println("Oops - read a character: " + (char) c);
					}
				} catch (IOException ex) {
					System.out.println("Exception on sensor server socket" + ex.getMessage());
				} finally {
					closeIt(s);
				}
			}
		}
	}
	
	static class TargetServer extends ThreadEx {
		public void runEx() throws Exception {
			int nonEmptyCount = 0;
			int acceptCount = 0;
			ServerSocket server;
			server = new ServerSocket(3020);
			for(;;) {
				Socket s = null;
				try {
					s = server.accept();
					acceptCount++;
					if(acceptCount % 10 == 0) {
						System.out.println("Accept count = " + acceptCount + ", non empty count = " + nonEmptyCount);
					}
					boolean empty = true;
					for(;;) {
						int c = s.getInputStream().read();
						if(c == -1) {
							if(!empty)
								nonEmptyCount++;
							break;
						}
						empty = false;
					}
				} catch (IOException ex) {
					System.out.println("Exception on target server socket" + ex.getMessage());
				} finally {
					closeIt(s);
				}
			}
		}
	}
	
	static class SensorClient extends Thread {
		private static boolean wake;
		private static SensorClient theClient;
		public void run() {
			for(;;) {
				Socket s = null;
				try {
					s = new Socket();
					synchronized(this) {
						while(!wake) {
							try {
								wait();
							} catch (InterruptedException ex) { }
						}
					}
					wake = false;
					s.connect(new InetSocketAddress("127.0.0.1", 3010));
					try {
						Thread.sleep(10);
					} catch (InterruptedException ex) { }
				} catch (IOException ex) {
					System.out.println("Exception on sensor client " + ex.getMessage());
				} finally {
					if(s != null) {
						try {
							s.close();
						} catch(IOException ex) {}
					}
				}
			}
		}
		
		public SensorClient() {
			theClient = this;
		}
		
		public static void wakeMe() {
			synchronized(theClient) {
				wake = true;
				theClient.notify();
			}
		}
	}
	
	static class TargetClient extends Thread {
		volatile boolean ready = false;
		public void run() {
			for(;;) {
				try {
					final SocketChannel s = SocketChannel.open(new InetSocketAddress("127.0.0.1", 3020));
					s.finishConnect();
					s.socket().setSoLinger(false, 0);
					ready = false;
					Thread t = new Thread() {
						public void run() {
							ByteBuffer b = ByteBuffer.allocate(1);
							try {
								for(;;) {
									b.clear();
									b.put((byte) 'A');
									b.flip();
									s.write(b);
									ready = true;
								}
							} catch (IOException ex) {
								if(!(ex instanceof ClosedChannelException))
									System.out.println("Exception in target client child " + ex.toString());
							}
						}
					};
					t.start();
					while(!ready)
						Thread.yield();
					s.close();
					SensorClient.wakeMe();
					t.join();
				} catch (IOException ex) {
					System.out.println("Exception in target client parent " + ex.getMessage());
				} catch (InterruptedException ex) {
				}
			}
		}
	}
	
	static abstract class ThreadEx extends Thread {
		public void run() {
			try {
				runEx();
			} catch (Exception ex) {
				ex.printStackTrace();
			}
		}
		
		abstract void runEx() throws Exception;
	}
			
	
	public static void closeIt(Socket s) {
		try {
			if(s != null)
				s.close();
		} catch (IOException ex) {
		}
	}
	
	public static void main(String args[]) {
		for(int i = 0; i < args.length; i++) {
			if(args[i].equals("S"))
				new SensorServer().start();
			
			if(args[i].equals("T"))
				new TargetServer().start();
				
			if(args[i].equals("C")) {
				new SensorClient().start();
				new TargetClient().start();
			}
		}
	}
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
In the case of socket channels, there is a simple solution, which is to use an asynchronous shutdown (Socket.shutdownOutput()) rather than an asynchronous close. The close can then be performed synchronously when the writing thread is no longer writing.

For other types of channel the solution seems more difficult.

The underlying problem is that it is assumed that when a native socket is closed, it will no longer be possible to write to its file descriptor. Since the file descriptor is a number, this is only true if that file descriptor is not used to open some other object.

Since the client side program is constantly opening new sockets, there is a distinct possibility that this sequence of events occurs:

Thread 1

Reach the point where it is committed to entering a native write with a given file descriptor.

Thread 2

Close the file descriptor.

Thread 3

Open an unrelated connection, and get allocated the same descriptor that thread 1 is committed to writing to.

Thread 1

Continue with the native write operation, which writes to the the socket opened by thread 3.

On unix type systems that support dup2,  a possible solution (which I have not tested) is to delay closing the descriptor, and instead cause it to be reopened to something like /dev/null.  The descriptor would be closed only when there is no longer a possibility that a thread will write to it. Whether all unix/linux implementations can be trusted to handle this correctly in the context of blocked write to the descriptor in another thread is a matter for conjecture.

The error also occurs under MS Windows systems.
###@###.### 2005-06-15 12:10:21 GMT

Comments
EVALUATION See #6429043 for the coverage for Windows platform for this CR. We are fixing this issue for Solaris and Linux platform first under this CR.
23-05-2006

EVALUATION There is timing bug in the close mechanism whereby an async close can happen after the writer has performed the isOpen check but before it commences write on the socket. In that window if the preClose, signal, close, and the fd is recycled then the thread will write on the "wrong" socket. This scenario is demonstrated by the excellent test provided by the submitter. To fix this we need the close to be done by the reader or writer (which ever is last to reset the tid to 0). Careful testing will be require to ensure the solution does not introduce any deadlocks or other side effects.
16-03-2006