JDK-4266357 : HttpURLConnection.connect() fails with <1 minute of invocations on WinNT
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.net
  • Affected Version: 1.3.0
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: windows_nt
  • CPU: generic
  • Submitted: 1999-08-27
  • Updated: 2000-01-04
  • Resolved: 1999-12-20
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
1.3.0 kestrelFixed
Related Reports
Relates :  
Description
daniel.daugherty@Eng 1999-08-27

The HttpVersionTest test application fails in less than one minute on
WinNT 4.0 with JDK-Kestrel Build D. I ran the same test for an hour on
Win98 and on Solaris 2.6 without a problem. There appears to be some
sort of resource leak on WinNT that does not happen on the other
platforms.

I modified the HttpRequestDriver to keep statistics about the number of
connect() calls made during the test run. On a freshly rebooted 400 MhZ
Pentium II with 64MB of memory, the stats are:

    WinNT% java HttpVersionTest -l 1m 1.1 owjones
    connTotalCount=8608
    connFailCount=1164
    connPassCount=7444
    connFirstFail=3978
    FINALSTATUS:HttpVersionTest:EXIT_ERROR:2:Number of ERRORS:2328:TEST INCOMPLETE

There appears to be some resource reclaiming going on because the first
failure occurs on the 3978'th call to connect(), but after that point,
only 25% of the calls fail.

On a freshly rebooted 400 MhZ Pentium II with 64MB of memory with the
-Xint option, the stats are:

    WinNT% java -Xint HttpVersionTest -l 1m 1.1 owjones
    connTotalCount=6912
    connFailCount=735
    connPassCount=6177
    connFirstFail=3978
    FINALSTATUS:HttpVersionTest:EXIT_ERROR:2:Number of ERRORS:1470:TEST INCOMPLETE

Running the client on my Solaris 2.6 machine (same machine on which the
server is running):

    S2.6% java HttpVersionTest -l 1m 1.1 owjones
    connTotalCount=4768
    connFailCount=0
    connPassCount=4768
    connFirstFail=-1
    FINALSTATUS:HttpVersionTest:EXIT_PASS:0:TEST PASSED

Running the client on my Win98 machine (450 MhZ Pentium III with 64 MB
of memory):

    Win98% java HttpVersionTest -l 1m 1.1 owjones
    connTotalCount=8192
    connFailCount=0
    connPassCount=8192
    connFirstFail=-1
    FINALSTATUS:HttpVersionTest:EXIT_PASS:0:TEST PASSED

This failure happens with most of the new HTTP tests when they are run
for 1 minute instead of 1 iteration on WinNT. The ability to run for a
specific amount of time is a recent addition to this part of the
java.net test suite which is why this problem was not seen before.

I have attached a compressed tar archive of files that can be used to
reproduce the problem:

WinNT logs:
    client-int.log	// -Xint option specified
    client-mixed.log	// default mixed mode

Server application files:
    HttpEcho.class
    HttpEcho.java
    HttpEchoConstants.class
    HttpEchoConstants.java

Client application files:
    HttpRequestDriver.class
    HttpRequestDriver.java
    HttpVersionTest.class
    HttpVersionTest.java

Helper classes:
    ParseOptions.class		// command line option parser
    ParseTestOptions.class	// test application option parser
    TestDriver.class		// TestDriver
    TestDriverConstants.class	// TestDriver constants

On the server machine:

    S2.6% java HttpEcho -l 0	// loops forever

On the client machine:

    //
    // loops for 1 minute, testing for HTTP protocol version 1.1 with
    // the server owjones
    //
    WinNT% java HttpVersionTest -l 1m 1.1 owjones

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: generic FIXED IN: kestrel INTEGRATED IN: kestrel VERIFIED IN: kestrel
14-06-2004

EVALUATION Does this problem happen with Cricket, JDK1.2 or JDK1.1.8 or is this Kestrel-specific? jeff.nisewanger@Eng 1999-08-27 There are two likely senarios for the failure. One, somewhere in the code, either JDK native code part or test code, we are not closing the socket properly, thus not releasing the file descriptor; Two, the native OS is not releasing the system resource fast enough. Since the failure only occurs on windows platform. It is likely to be the latter case. Further, I found some pointers on the web about this: " At 09:13 AM 5/19/97 -0700, EWANG.US.ORACLE.COM wrote: >We are running a stess test program on the NT 4.0 workstation. >We would like to allow the client to open 10000 connections to a >given server but we run into the limit of less than 4000 connections >can be made. > >We tried the following two scenarios and get two different error codes. > >1. Allow Microsoft stack to assign local port on the connect(). > We got WSAEADDRINUSE around 3986 connections. > >2. Issue bind() to sequentially assign local port starting from > port 5001. > We got WSAENOBUFS around 3900 connections. >s to try pinpoint the problem. From http://www.stardust.com/cgi-bin/wa?A2=ind9705&L=WINSOC&P=R3453 "You can reduce the time that a Socket is in TIME_WAIT state to reuse it before the classic "2MSL" time, in Windows NT 4 the default is 4 minutes. Here is the Key that you can touch to reduce the time : HKEY_LOCAL_MACHINE\CurrenControlSet\Services Key: Tcpip\Parameters\TcpTimedWaitDelay Value Type: REG_DWORD - Time in seconds Valid Range: 30-300 (decimal) Default: 0xF0 (240 decimal) Note that if the key is missing NT set the time out in 4 minutes. Bye. Ricardo D. Albano >A question for the floor... > >If I make very high rate connections, Even with SO_REUSEADDR I start >getting sockets under netstat in the TIME_WAIT state, and get WSAEADDRINUSE >errors. If I slow the rate down, delaying before the close, no problem. >Once this problem starts, I have to wait the 4 minutes or whetever for the >timeout to clean these guys up !! > >Is there a work around? " From http://www.stardust.com/cgi-bin/wa?A2=ind9902&L=WINSOC&P=R5906 We did the following tests to try to confirm the above hypothethis. 1. Dan added delay and retry when we see a connection failure due to WSAEADDRINUSE. This greatly reduces the number of failures we saw ealier; 2. We rewrote the test to use bare Socket instead of HttpURLConnection, and set SO_Linger option to 0. Although we did see the connection failure to be a lot less, 14 vs. 735 previously, this is not conclusive because by using Socket we could inadvertently varify other parameters. We tentatively conclude that the failure is in WinNT as apposed to JDK. But we will continue to investigate. This evaluation is added here for record. yingxian.wang@eng 1999-10-04 daniel.daugherty@Eng 1999-12-20 A work around for this bug was created for Kestrel FCS-I. The work around retries the HttpURLConnection.connect() call when TCP runs out of sockets. The underlying problem is due to sockets being in a TIME_WAIT state which is inherent in TCP. However, since WinNT only supports ~4K sockets the problem is aggravated. See the following bug for a more detailed analysis of a similar problem on a Solaris machine: 4294599 3/2 socket creation fails with EADDRNOTAVAIL on Solaris 7
11-06-2004