JDK-8075484 : SocketInputStream.socketRead0 can hang even with soTimeout set
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.net
  • Affected Version: 8u40
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux
  • CPU: x86_64
  • Submitted: 2015-03-18
  • Updated: 2019-12-18
  • Resolved: 2016-09-13
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7 JDK 8 JDK 9 Other
7u131Fixed 8u131Fixed 9 b137Fixed openjdk7uFixed
Related Reports
Duplicate :  
Duplicate :  
Duplicate :  
Relates :  
Description
FULL PRODUCT VERSION :


A DESCRIPTION OF THE PROBLEM :
As noted at JDK-8049846, the implementation of Java_java_net_SocketInputStream_socketRead0 assumes that read() won't block after poll() reports that a read is possible. This assumption does not hold, as noted on the man page for select (referenced by the man page for poll): Under Linux, select() may report a socket file descriptor as "ready for reading", while nevertheless a subsequent read blocks.  This could for example happen when data  has  arrived  but upon  examination  has  wrong checksum and is discarded.  There may be other circumstances in which a file descriptor is spuriously reported as ready.  Thus it may be safer to use O_NONBLOCK on sockets that should not block.

In production, we hit this about once a day. For testing and reproduction purposes, we can use fault injection to get spurious poll() results on demand.

[This report is probably more appropriate as a comment on JDK-8049846, but commenting requires an account, and obtaining an account does not appear to be an easy task] 

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Create file called poll.c with contents:
-----
#define _GNU_SOURCE
#include <poll.h>
#include <dlfcn.h>
#include <stdio.h>

int poll(struct pollfd *fds, nfds_t nfds, int timeout) {
  static int n;
  if ((++n & 0x3) == 0) {
    // Fault injection: perhaps we should report a spurious readiness notification
    int i;
    for (i = 0; i < nfds; ++i) {
      if (fds[i].events & POLLIN) {
        fds[i].revents |= POLLIN;
	return 1;
      }
    }
  }
  return ((int(*)(struct pollfd*,nfds_t,int))dlsym(RTLD_NEXT, "poll"))(fds, nfds, timeout);
}
-----

2. Create file called OneReaderThread.java with contents:
-----
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.ServerSocket;
import java.net.Socket;

public class OneReaderThread {
	public static void main(String[] args) throws IOException {
		@SuppressWarnings("resource")
		ServerSocket serverSocket = new ServerSocket(17291);
		new ReadingThreadA().start();
		Socket client = serverSocket.accept();
		OutputStream os = client.getOutputStream();
		byte[] writeData = new byte[2];
		for (;;) {
			waitingForA = true;
			long start = System.currentTimeMillis();
			while (waitingForA) {
				long now = System.currentTimeMillis();
				if (now > start + 500) {
					// 500ms have passed, which is 10x the read timeout 
					System.out.println("Should never happen: A is unresponsive");
					os.write(writeData);
					break;
				}
			}
		}
	}
	
	private static volatile boolean waitingForA;
	
	private static final class ReadingThreadA extends Thread {
		@Override
		public void run() {
			try {
				@SuppressWarnings("resource")
				Socket s = new Socket("localhost", 17291);
				s.setSoTimeout(50); // SO_TIMEOUT is set, meaning that reads should not block for more than 50ms
				final InputStream is = s.getInputStream();
				byte[] readDataA = new byte[2];
				for (;;) {
					int n = 0;
					try {
						n = is.read(readDataA);
					} catch (IOException e) {
						// Ignore
					}
					System.out.println("A tick (" + n + ")");
					waitingForA = false; // This assignment should happen at least once every 50ms
				}
			} catch (Exception e) {
				e.printStackTrace();
			}
		}
	}
}
-----

3. Compile the C code: gcc -o poll.so -shared poll.c -ldl -fPIC
4. Compile the Java code: javac OneReaderThread.java
5. Run the Java code with the C library preloaded: LD_PRELOAD=./poll.so java -cp . OneReaderThread

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Expect an output stream consisting solely of:
A tick (0)
A tick (0)
A tick (0)
A tick (0)
A tick (0)
A tick (0)
A tick (0)
A tick (0)
ACTUAL -
Actual output stream is repetitions of:
Should never happen: A is unresponsive
A tick (2)
A tick (0)
A tick (0)
A tick (0)
Should never happen: A is unresponsive
A tick (2)
A tick (0)
A tick (0)
A tick (0)

REPRODUCIBILITY :
This bug can be reproduced occasionally.


Comments
Critical request as a) this was already in 17_04 and appears to have been pushed out for some reason and b) we have already committed to providing this fix in Oct in a number of escalations.
11-07-2017

I will do the prototype, as I am suspecting that JDK-8134601 if failing intermittently because of the same root cause.
09-08-2016

The connection between 'poll' and 'select' is tenuous, but I accept that there may be an issue, albeit rare. Some testing should be done to see if the issue can be reproduced in-house. It seems reasonable to prototype a version of socketRead0 that flips the non-blocking bit before and after, to see what other implications arise from doing that.
09-08-2016

1) Attaching the testcase (poll.c, OneReaderThread.java). 2) On a Linux 32-bit system, checked this for JDK 7u80, 8u31, 8u40, 8u60 b06, 9 ea b54 and could reproduce this issue. **************************************************************************************** 7u80: FAIL 8: FAIL 8u31 : FAIL 8u40: FAIL 8u60 ea b05: FAIL 9 ea b54: FAIL ****************************************************************************************** Output of the testcase in 8u40: $ LD_PRELOAD=./poll.so /net/jre.us.oracle.com/onestop/j2sdk/1.8.0_40/latest/binaries/linux-i586/bin/java -cp . OneReaderThread Should never happen: A is unresponsive A tick (0) A tick (2) Should never happen: A is unresponsive A tick (2) A tick (0) A tick (0) A tick (0) Should never happen: A is unresponsive A tick (2) A tick (0) A tick (0) ................... Output of the testcase in 7u80: $ LD_PRELOAD=./poll.so /net/scanas412.us.oracle.com/export/java_re/jdk/7u80/latest/binaries/linux-i586/bin/java -cp . OneReaderThread Should never happen: A is unresponsive A tick (0) A tick (2) Should never happen: A is unresponsive A tick (2) A tick (0) A tick (0) A tick (0) Should never happen: A is unresponsive A tick (2) A tick (0) A tick (0) A tick (0) Should never happen: A is unresponsive ..............................
19-03-2015