JDK-8238579 : HttpsURLConnection drops the timeout and hangs forever in read
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.net
  • Affected Version: 8,11,14,15
  • Priority: P4
  • Status: Closed
  • Resolution: Fixed
  • OS: linux_ubuntu
  • CPU: x86_64
  • Submitted: 2020-01-28
  • Updated: 2021-03-01
  • Resolved: 2020-03-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 8 Other
11.0.10Fixed 8u271Fixed openjdk8u292Fixed
Description
ADDITIONAL SYSTEM INFORMATION :
Confirmed in openjdk version "11.0.5" 2019-10-15 and openjdk version "1.8.0_232"

A DESCRIPTION OF THE PROBLEM :
HttpUrlConnection.writeRequests contains logic that retries on write errors. In that code path setNewClient or setProxiedClient are called. AbstractDelegateHttpsURLConnection (subclassed by DelegateHttpsURLConnection) overrides both setNewClient and setProxiedClient. Contrary to the implementation in HttpUrlConnection, these implementations do not call NetworkClient.setReadTimeout. Consequently, the new client does not have the timeout set. 
This is a similar pattern to https://bugs.openjdk.java.net/browse/JDK-4646885

This leads to infinitely hung applications in scenarios, where the retry code path in HttpUrlConnection.writeRequests executes and the server silently drops the TCP connection.

The bug can be reproduced every time with appropriate failure injection.

Below are the stack traces from hung applications.

JDK 11

at java.net.SocketInputStream.socketRead0(java.base@11.0.5/Native Method)
at java.net.SocketInputStream.socketRead(java.base@11.0.5/SocketInputStream.java:115)
at java.net.SocketInputStream.read(java.base@11.0.5/SocketInputStream.java:168)
at java.net.SocketInputStream.read(java.base@11.0.5/SocketInputStream.java:140)
at sun.security.ssl.SSLSocketInputRecord.read(java.base@11.0.5/SSLSocketInputRecord.java:448)
at sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(java.base@11.0.5/SSLSocketInputRecord.java:68)
at sun.security.ssl.SSLSocketImpl.readApplicationRecord(java.base@11.0.5/SSLSocketImpl.java:1104)
at sun.security.ssl.SSLSocketImpl$AppInputStream.read(java.base@11.0.5/SSLSocketImpl.java:823)
- locked <0x0000000630a20790> (a sun.security.ssl.SSLSocketImpl$AppInputStream)
at java.io.BufferedInputStream.fill(java.base@11.0.5/BufferedInputStream.java:252)
at java.io.BufferedInputStream.read1(java.base@11.0.5/BufferedInputStream.java:292)
at java.io.BufferedInputStream.read(java.base@11.0.5/BufferedInputStream.java:351)
- locked <0x0000000630ae0ac0> (a java.io.BufferedInputStream)
at sun.net.www.http.HttpClient.parseHTTPHeader(java.base@11.0.5/HttpClient.java:746)
at sun.net.www.http.HttpClient.parseHTTP(java.base@11.0.5/HttpClient.java:689)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(java.base@11.0.5/HttpURLConnection.java:1610)
- locked <0x0000000630a08718> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(java.base@11.0.5/HttpURLConnection.java:1515)
- locked <0x0000000630a08718> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
at java.net.HttpURLConnection.getResponseCode(java.base@11.0.5/HttpURLConnection.java:527)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(java.base@11.0.5/HttpsURLConnectionImpl.java:334)

Thread 18 (Thread 0x7fd07fcfb700 (LWP 5340)):
#0  0x00007fd0f74b4a66 in __libc_recv (fd=10, buf=0x7fd07fce9830, len=5, flags=0) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  0x00007fd0b840f10d in ?? () from /usr/lib/jvm/java-11-openjdk-amd64/lib/libnet.so
#2  0x00007fd0b840ea4e in Java_java_net_SocketInputStream_socketRead0 () from /usr/lib/jvm/java-11-openjdk-amd64/lib/libnet.so
#3  0x00007fd0d853f990 in ?? ()
#4  0x0000000600000000 in ?? ()
#5  0x0000000630ae6230 in ?? ()
#6  0x0000000630eac9f0 in ?? ()
#7  0x00007fd07fcf9888 in ?? ()
#8  0x00007fd0b9b685cd in ?? ()
#9  0x00007fd07fcf98f8 in ?? ()
#10 0x00007fd07fcf98a0 in ?? ()
#11 0x0000000000000000 in ?? ()


JDK 8:

java.net.SocketInputStream.socketRead0(Native Method)
java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
java.net.SocketInputStream.read(SocketInputStream.java:171)
java.net.SocketInputStream.read(SocketInputStream.java:141)
sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
sun.security.ssl.InputRecord.read(InputRecord.java:503)
sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983) => holding Monitor(java.lang.Object@438851338})
sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
sun.security.ssl.AppInputStream.read(AppInputStream.java:105) => holding Monitor(sun.security.ssl.AppInputStream@2010412364})
java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
java.io.BufferedInputStream.read(BufferedInputStream.java:345) => holding Monitor(java.io.BufferedInputStream@1680899557})
sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587) => holding Monitor(sun.net.www.protocol.https.DelegateHttpsURLConnection@1729312334})
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492) => holding Monitor(sun.net.www.protocol.https.DelegateHttpsURLConnection@1729312334})
java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:347)

Thread 23 (Thread 0x7f99f814c700 (LWP 14213)):
#0  0x00007f9a1a3ffa66 in __libc_recv (fd=25, buf=0x7f99f813a8b0, len=5, flags=0) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  0x00007f99f868916d in NET_Read () from /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libnet.so
#2  0x00007f99f8688af2 in Java_java_net_SocketInputStream_socketRead0 () from /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libnet.so
#3  0x00007f9a0418f427 in ?? ()
#4  0x00007f9a00000000 in ?? ()
#5  0x00007f99f814a948 in ?? ()
#6  0x00007f99f8264d38 in ?? ()
#7  0xa08c80390f95f200 in ?? ()
#8  0x00007f9a00b8e158 in ?? ()
#9  0x00007f9a00a7bb88 in ?? ()
#10 0x00007f99f814a930 in ?? ()
#11 0x0000000000000000 in ?? ()




Comments
8u RFR: https://mail.openjdk.java.net/pipermail/jdk8u-dev/2021-January/013344.html
22-01-2021

Fix request (8u): Requesting backport of this bugfix for parity with Oracle. Original patch applies clean, except for copyright header and source file location. Relevant regression tests are passed.
22-01-2021

Fix request (13u): Should be fixed in 13u, too. Patch applies clean, all relevant regtests pass.
24-11-2020

Fix request (11u): Requesting backport of this bugfix on request of a customer who runs into the issue. Patch applies clean, except for copyright header and passes SAP's regression testing.
19-11-2020

URL: https://hg.openjdk.java.net/jdk/jdk/rev/ca2dcaf37fad User: vtewari Date: 2020-03-07 13:06:35 +0000
07-03-2020

Below patch resolves this issue at my end(i tested it on JDK8u, i was not able to reproduce this issue with latest JDK). I fixed the issue with ConnectTimeout as well. diff -r 74ecb602ffb0 src/java.base/share/classes/sun/net/www/protocol/https/AbstractDelegateHttpsURLConnection.java --- a/src/java.base/share/classes/sun/net/www/protocol/https/AbstractDelegateHttpsURLConnection.java Wed Jan 29 23:53:16 2020 -0500 +++ b/src/java.base/share/classes/sun/net/www/protocol/https/AbstractDelegateHttpsURLConnection.java Fri Feb 07 11:53:39 2020 +0530 @@ -87,10 +87,15 @@ */ public void setNewClient (URL url, boolean useCache) throws IOException { + int readTimeout = getReadTimeout(); http = HttpsClient.New (getSSLSocketFactory(), url, getHostnameVerifier(), - useCache, this); + (String)null, + -1, + useCache, + getConnectTimeout(), this); + http.setReadTimeout(readTimeout); ((HttpsClient)http).afterConnect(); } @@ -132,10 +137,14 @@ boolean useCache) throws IOException { if (connected) return; - http = HttpsClient.New (getSSLSocketFactory(), - url, - getHostnameVerifier(), - proxyHost, proxyPort, useCache, this); + int readTimeout = getReadTimeout(); + http = HttpsClient.New(getSSLSocketFactory(), + url, + getHostnameVerifier(), + proxyHost, proxyPort, + useCache, + getConnectTimeout(), this); + http.setReadTimeout(readTimeout); connected = true; }
07-02-2020

The submitter also prepared an alternative version of the reproducer on the client side. It is implemented in the attached file. It always reproduce in the 5th invocation of send(). It should work across different JDK versions.
07-02-2020

Tried JDK 8 and 11, the HttpsURLConnection drops the timeout and hangs as described by the submitter.
06-02-2020

Further information from the submitter: If you do not see the "return -1" line in the output, it means that the injection of intermittent network failure didn't work on the client side and the issue will not reproduce. The output that you sent is expected in this case. As mentioned earlier, you need to tune and recompile hooks.c to inject failure into the right send() invocation. Based on the logs that you sent, it looks like you need the following code in hooks.c: JDK 1.8.0_241: if (port == 20111 && nbytes == 229 && !failed_once) { printf("HOOK_SEND[%d]: return -1\n", pid); failed_once = 1; return -1; } For JDK 11.0.6: if (port == 20111 && nbytes == 181 && !failed_once) { printf("HOOK_SEND[%d]: return -1\n", pid); failed_once = 1; return -1; } After applying the above change, the same problem is reproducible with JDK 1.8.0_241: root@localhost JDK_hang_repro]# LD_PRELOAD=$PWD/hooks.so ../jdk1.8.0_241/bin/java -cp target/JDK_hang_repro-1.0-SNAPSHOT.jar JDKHangRepro HOOK_SEND[3637]: send to=0.0.0.0:20111 nbytes=190 HOOK_SEND[3637]: send to=0.0.0.0:20111 nbytes=75 HOOK_SEND[3637]: send to=0.0.0.0:20111 nbytes=6 HOOK_SEND[3637]: send to=0.0.0.0:20111 nbytes=101 HOOK_SEND[3637]: send to=0.0.0.0:20111 nbytes=229 HOOK_SEND[3637]: return -1 HOOK_SEND[3637]: send to=0.0.0.0:20111 nbytes=85 HOOK_SEND[3637]: send to=0.0.0.0:20111 nbytes=190 HOOK_SEND[3637]: send to=0.0.0.0:20111 nbytes=75 HOOK_SEND[3637]: send to=0.0.0.0:20111 nbytes=6 HOOK_SEND[3637]: send to=0.0.0.0:20111 nbytes=101 HOOK_SEND[3637]: send to=0.0.0.0:20111 nbytes=229
05-02-2020

Test result with Oracle Linux and JDK 1.8.0_241 [root@localhost JDK_hang_repro]# LD_PRELOAD=$PWD/hooks.so ../jdk1.8.0_241/bin/java -cp target/JDK_hang_repro-1.0-SNAPSHOT.jar JDKHangRepro HOOK_SEND[6973]: send to=0.0.0.0:20111 nbytes=190 HOOK_SEND[6973]: send to=0.0.0.0:20111 nbytes=75 HOOK_SEND[6973]: send to=0.0.0.0:20111 nbytes=6 HOOK_SEND[6973]: send to=0.0.0.0:20111 nbytes=101 HOOK_SEND[6973]: send to=0.0.0.0:20111 nbytes=229 HOOK_SEND[6973]: send to=0.0.0.0:20111 nbytes=85 Exception Read timed out
04-02-2020

After modifying hook.c, the program hangs forever in read: [root@localhost JDK_hang_repro]# LD_PRELOAD=$PWD/hooks.so ../jdk-11.0.6/bin/java -cp target/JDK_hang_repro-1.0-SNAPSHOT.jar JDKHangRepro HOOK_SEND[3408]: send to=0.0.0.0:20111 nbytes=408 HOOK_SEND[3408]: send to=0.0.0.0:20111 nbytes=75 HOOK_SEND[3408]: send to=0.0.0.0:20111 nbytes=6 HOOK_SEND[3408]: send to=0.0.0.0:20111 nbytes=45 HOOK_SEND[3408]: send to=0.0.0.0:20111 nbytes=181 HOOK_SEND[3408]: return -1 HOOK_SEND[3408]: send to=0.0.0.0:20111 nbytes=31 HOOK_SEND[3408]: send to=0.0.0.0:20111 nbytes=408 HOOK_SEND[3408]: send to=0.0.0.0:20111 nbytes=75 HOOK_SEND[3408]: send to=0.0.0.0:20111 nbytes=6 HOOK_SEND[3408]: send to=0.0.0.0:20111 nbytes=45 HOOK_SEND[3408]: send to=0.0.0.0:20111 nbytes=181
01-02-2020

Further instructions from the submitter: As mentioned in the README, the number of bytes sent in each invocation of the send() call is very sensitive to JDK version. It seems that the failure injection should come at the moment, where 181 bytes are sent. By modifying hooks.c to capture that one send, it would look like this: if (port == 20111 && nbytes == 181 && !failed_once) { printf("HOOK_SEND[%d]: return -1\n", pid); failed_once = 1; return -1; } The client should print "HOOK_SEND[%d]: return -1" right after "HOOK_SEND[16238]: send to=0.0.0.0:20111 nbytes=181". Otherwise the failure injection didn't work.
01-02-2020

Tested on Oralce Linux with JDK 11.0.6: [root@localhost JDK_hang_repro]# LD_PRELOAD=$PWD/hooks.so ../jdk-11.0.6/bin/java -cp target/JDK_hang_repro-1.0-SNAPSHOT.jar JDKHangRepro HOOK_SEND[16238]: send to=0.0.0.0:20111 nbytes=408 HOOK_SEND[16238]: send to=0.0.0.0:20111 nbytes=75 HOOK_SEND[16238]: send to=0.0.0.0:20111 nbytes=6 HOOK_SEND[16238]: send to=0.0.0.0:20111 nbytes=45 HOOK_SEND[16238]: send to=0.0.0.0:20111 nbytes=181 HOOK_SEND[16238]: send to=0.0.0.0:20111 nbytes=31 Exception Read timed out
31-01-2020

Requested more details information of reproducing the problem.
30-01-2020