JDK-8238579 : HttpsURLConnection drops the timeout and hangs forever in read
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.net
  • Affected Version: 8,11,14,15
  • Priority: P4
  • Status: Closed
  • Resolution: Fixed
  • OS: linux_ubuntu
  • CPU: x86_64
  • Submitted: 2020-01-28
  • Updated: 2021-03-01
  • Resolved: 2020-03-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 8 Other
11.0.10Fixed 8u271Fixed openjdk8u292Fixed
Confirmed in openjdk version "11.0.5" 2019-10-15 and openjdk version "1.8.0_232"

HttpUrlConnection.writeRequests contains logic that retries on write errors. In that code path setNewClient or setProxiedClient are called. AbstractDelegateHttpsURLConnection (subclassed by DelegateHttpsURLConnection) overrides both setNewClient and setProxiedClient. Contrary to the implementation in HttpUrlConnection, these implementations do not call NetworkClient.setReadTimeout. Consequently, the new client does not have the timeout set. 
This is a similar pattern to https://bugs.openjdk.java.net/browse/JDK-4646885

This leads to infinitely hung applications in scenarios, where the retry code path in HttpUrlConnection.writeRequests executes and the server silently drops the TCP connection.

The bug can be reproduced every time with appropriate failure injection.

Below are the stack traces from hung applications.

JDK 11

at java.net.SocketInputStream.socketRead0(java.base@11.0.5/Native Method)
at java.net.SocketInputStream.socketRead(java.base@11.0.5/SocketInputStream.java:115)
at java.net.SocketInputStream.read(java.base@11.0.5/SocketInputStream.java:168)
at java.net.SocketInputStream.read(java.base@11.0.5/SocketInputStream.java:140)
at sun.security.ssl.SSLSocketInputRecord.read(java.base@11.0.5/SSLSocketInputRecord.java:448)
at sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(java.base@11.0.5/SSLSocketInputRecord.java:68)
at sun.security.ssl.SSLSocketImpl.readApplicationRecord(java.base@11.0.5/SSLSocketImpl.java:1104)
at sun.security.ssl.SSLSocketImpl$AppInputStream.read(java.base@11.0.5/SSLSocketImpl.java:823)
- locked <0x0000000630a20790> (a sun.security.ssl.SSLSocketImpl$AppInputStream)
at java.io.BufferedInputStream.fill(java.base@11.0.5/BufferedInputStream.java:252)
at java.io.BufferedInputStream.read1(java.base@11.0.5/BufferedInputStream.java:292)
at java.io.BufferedInputStream.read(java.base@11.0.5/BufferedInputStream.java:351)
- locked <0x0000000630ae0ac0> (a java.io.BufferedInputStream)
at sun.net.www.http.HttpClient.parseHTTPHeader(java.base@11.0.5/HttpClient.java:746)
at sun.net.www.http.HttpClient.parseHTTP(java.base@11.0.5/HttpClient.java:689)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(java.base@11.0.5/HttpURLConnection.java:1610)
- locked <0x0000000630a08718> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(java.base@11.0.5/HttpURLConnection.java:1515)
- locked <0x0000000630a08718> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
at java.net.HttpURLConnection.getResponseCode(java.base@11.0.5/HttpURLConnection.java:527)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(java.base@11.0.5/HttpsURLConnectionImpl.java:334)

Thread 18 (Thread 0x7fd07fcfb700 (LWP 5340)):
#0  0x00007fd0f74b4a66 in __libc_recv (fd=10, buf=0x7fd07fce9830, len=5, flags=0) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  0x00007fd0b840f10d in ?? () from /usr/lib/jvm/java-11-openjdk-amd64/lib/libnet.so
#2  0x00007fd0b840ea4e in Java_java_net_SocketInputStream_socketRead0 () from /usr/lib/jvm/java-11-openjdk-amd64/lib/libnet.so
#3  0x00007fd0d853f990 in ?? ()
#4  0x0000000600000000 in ?? ()
#5  0x0000000630ae6230 in ?? ()
#6  0x0000000630eac9f0 in ?? ()
#7  0x00007fd07fcf9888 in ?? ()
#8  0x00007fd0b9b685cd in ?? ()
#9  0x00007fd07fcf98f8 in ?? ()
#10 0x00007fd07fcf98a0 in ?? ()
#11 0x0000000000000000 in ?? ()

JDK 8:

java.net.SocketInputStream.socketRead0(Native Method)
sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983) => holding Monitor(java.lang.Object@438851338})
sun.security.ssl.AppInputStream.read(AppInputStream.java:105) => holding Monitor(sun.security.ssl.AppInputStream@2010412364})
java.io.BufferedInputStream.read(BufferedInputStream.java:345) => holding Monitor(java.io.BufferedInputStream@1680899557})
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587) => holding Monitor(sun.net.www.protocol.https.DelegateHttpsURLConnection@1729312334})
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492) => holding Monitor(sun.net.www.protocol.https.DelegateHttpsURLConnection@1729312334})

Thread 23 (Thread 0x7f99f814c700 (LWP 14213)):
#0  0x00007f9a1a3ffa66 in __libc_recv (fd=25, buf=0x7f99f813a8b0, len=5, flags=0) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  0x00007f99f868916d in NET_Read () from /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libnet.so
#2  0x00007f99f8688af2 in Java_java_net_SocketInputStream_socketRead0 () from /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libnet.so
#3  0x00007f9a0418f427 in ?? ()
#4  0x00007f9a00000000 in ?? ()
#5  0x00007f99f814a948 in ?? ()
#6  0x00007f99f8264d38 in ?? ()
#7  0xa08c80390f95f200 in ?? ()
#8  0x00007f9a00b8e158 in ?? ()
#9  0x00007f9a00a7bb88 in ?? ()
#10 0x00007f99f814a930 in ?? ()
#11 0x0000000000000000 in ?? ()

8u RFR: https://mail.openjdk.java.net/pipermail/jdk8u-dev/2021-January/013344.html

Fix request (8u): Requesting backport of this bugfix for parity with Oracle. Original patch applies clean, except for copyright header and source file location. Relevant regression tests are passed.

Fix request (13u): Should be fixed in 13u, too. Patch applies clean, all relevant regtests pass.

Fix request (11u): Requesting backport of this bugfix on request of a customer who runs into the issue. Patch applies clean, except for copyright header and passes SAP's regression testing.

URL: https://hg.openjdk.java.net/jdk/jdk/rev/ca2dcaf37fad User: vtewari Date: 2020-03-07 13:06:35 +0000

Below patch resolves this issue at my end(i tested it on JDK8u, i was not able to reproduce this issue with latest JDK). I fixed the issue with ConnectTimeout as well. diff -r 74ecb602ffb0 src/java.base/share/classes/sun/net/www/protocol/https/AbstractDelegateHttpsURLConnection.java --- a/src/java.base/share/classes/sun/net/www/protocol/https/AbstractDelegateHttpsURLConnection.java Wed Jan 29 23:53:16 2020 -0500 +++ b/src/java.base/share/classes/sun/net/www/protocol/https/AbstractDelegateHttpsURLConnection.java Fri Feb 07 11:53:39 2020 +0530 @@ -87,10 +87,15 @@ */ public void setNewClient (URL url, boolean useCache) throws IOException { + int readTimeout = getReadTimeout(); http = HttpsClient.New (getSSLSocketFactory(), url, getHostnameVerifier(), - useCache, this); + (String)null, + -1, + useCache, + getConnectTimeout(), this); + http.setReadTimeout(readTimeout); ((HttpsClient)http).afterConnect(); } @@ -132,10 +137,14 @@ boolean useCache) throws IOException { if (connected) return; - http = HttpsClient.New (getSSLSocketFactory(), - url, - getHostnameVerifier(), - proxyHost, proxyPort, useCache, this); + int readTimeout = getReadTimeout(); + http = HttpsClient.New(getSSLSocketFactory(), + url, + getHostnameVerifier(), + proxyHost, proxyPort, + useCache, + getConnectTimeout(), this); + http.setReadTimeout(readTimeout); connected = true; }

The submitter also prepared an alternative version of the reproducer on the client side. It is implemented in the attached file. It always reproduce in the 5th invocation of send(). It should work across different JDK versions.

Tried JDK 8 and 11, the HttpsURLConnection drops the timeout and hangs as described by the submitter.

Further information from the submitter: If you do not see the "return -1" line in the output, it means that the injection of intermittent network failure didn't work on the client side and the issue will not reproduce. The output that you sent is expected in this case. As mentioned earlier, you need to tune and recompile hooks.c to inject failure into the right send() invocation. Based on the logs that you sent, it looks like you need the following code in hooks.c: JDK 1.8.0_241: if (port == 20111 && nbytes == 229 && !failed_once) { printf("HOOK_SEND[%d]: return -1\n", pid); failed_once = 1; return -1; } For JDK 11.0.6: if (port == 20111 && nbytes == 181 && !failed_once) { printf("HOOK_SEND[%d]: return -1\n", pid); failed_once = 1; return -1; } After applying the above change, the same problem is reproducible with JDK 1.8.0_241: root@localhost JDK_hang_repro]# LD_PRELOAD=$PWD/hooks.so ../jdk1.8.0_241/bin/java -cp target/JDK_hang_repro-1.0-SNAPSHOT.jar JDKHangRepro HOOK_SEND[3637]: send to= nbytes=190 HOOK_SEND[3637]: send to= nbytes=75 HOOK_SEND[3637]: send to= nbytes=6 HOOK_SEND[3637]: send to= nbytes=101 HOOK_SEND[3637]: send to= nbytes=229 HOOK_SEND[3637]: return -1 HOOK_SEND[3637]: send to= nbytes=85 HOOK_SEND[3637]: send to= nbytes=190 HOOK_SEND[3637]: send to= nbytes=75 HOOK_SEND[3637]: send to= nbytes=6 HOOK_SEND[3637]: send to= nbytes=101 HOOK_SEND[3637]: send to= nbytes=229

Test result with Oracle Linux and JDK 1.8.0_241 [root@localhost JDK_hang_repro]# LD_PRELOAD=$PWD/hooks.so ../jdk1.8.0_241/bin/java -cp target/JDK_hang_repro-1.0-SNAPSHOT.jar JDKHangRepro HOOK_SEND[6973]: send to= nbytes=190 HOOK_SEND[6973]: send to= nbytes=75 HOOK_SEND[6973]: send to= nbytes=6 HOOK_SEND[6973]: send to= nbytes=101 HOOK_SEND[6973]: send to= nbytes=229 HOOK_SEND[6973]: send to= nbytes=85 Exception Read timed out

After modifying hook.c, the program hangs forever in read: [root@localhost JDK_hang_repro]# LD_PRELOAD=$PWD/hooks.so ../jdk-11.0.6/bin/java -cp target/JDK_hang_repro-1.0-SNAPSHOT.jar JDKHangRepro HOOK_SEND[3408]: send to= nbytes=408 HOOK_SEND[3408]: send to= nbytes=75 HOOK_SEND[3408]: send to= nbytes=6 HOOK_SEND[3408]: send to= nbytes=45 HOOK_SEND[3408]: send to= nbytes=181 HOOK_SEND[3408]: return -1 HOOK_SEND[3408]: send to= nbytes=31 HOOK_SEND[3408]: send to= nbytes=408 HOOK_SEND[3408]: send to= nbytes=75 HOOK_SEND[3408]: send to= nbytes=6 HOOK_SEND[3408]: send to= nbytes=45 HOOK_SEND[3408]: send to= nbytes=181

Further instructions from the submitter: As mentioned in the README, the number of bytes sent in each invocation of the send() call is very sensitive to JDK version. It seems that the failure injection should come at the moment, where 181 bytes are sent. By modifying hooks.c to capture that one send, it would look like this: if (port == 20111 && nbytes == 181 && !failed_once) { printf("HOOK_SEND[%d]: return -1\n", pid); failed_once = 1; return -1; } The client should print "HOOK_SEND[%d]: return -1" right after "HOOK_SEND[16238]: send to= nbytes=181". Otherwise the failure injection didn't work.

Tested on Oralce Linux with JDK 11.0.6: [root@localhost JDK_hang_repro]# LD_PRELOAD=$PWD/hooks.so ../jdk-11.0.6/bin/java -cp target/JDK_hang_repro-1.0-SNAPSHOT.jar JDKHangRepro HOOK_SEND[16238]: send to= nbytes=408 HOOK_SEND[16238]: send to= nbytes=75 HOOK_SEND[16238]: send to= nbytes=6 HOOK_SEND[16238]: send to= nbytes=45 HOOK_SEND[16238]: send to= nbytes=181 HOOK_SEND[16238]: send to= nbytes=31 Exception Read timed out

Requested more details information of reproducing the problem.