JDK-6215050 : (so) SocketChannel created in CLOSE_WAIT and never cleaned up.. File Descriptor leak
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio
  • Affected Version: 1.4.2,5.0
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: solaris,solaris_8
  • CPU: sparc
  • Submitted: 2005-01-07
  • Updated: 2011-02-23
  • Resolved: 2005-04-16
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6
6 b33Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Description
FULL PRODUCT VERSION :
java version "1.4.2_04"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_04-b05)
Java HotSpot(TM) Client VM (build 1.4.2_04-b05, mixed mode)

ADDITIONAL OS VERSION INFORMATION :
Happens on Solaris:
SunOS sunfile 5.8 Generic_108528-22 sun4u sparc SUNW,Ultra-30
and Linux:
Linux forge.blast.com 2.4.22-1.2174.nptl #1 Web Feb 18 16:38:32 EST 2004 i686 i686 i386 GNU/Linux




EXTRA RELEVANT SYSTEM CONFIGURATION :
The sample code I have included needed a server to talk to so I have it connect to an smtp host and then send the QUIT command.  You just need to set the hostname to make it work.  Otherwise you need to modify the code to do other communications.

A DESCRIPTION OF THE PROBLEM :
There is a file descriptor leak when using SocketChannels.  The sockets that get created all end up in CLOSE_WAIT state and not cleaned up until the process exits.

Additionally (on Linux at least) there is a Pipe allocated for each channel used in the sample which is never released.

I have included a sample piece of code that when run will continually leak file descriptors on JDK 1.4.2.  It appears that it happens on all platforms but is easiest to reproduce on Solaris or Linux (for me at least)  I have seen a number of bugs written that  mention

The appears to occur only in to following conditions:

1. When SocketChannel.socket.setSoTimeout is called with a non-zero value
2. When a SocketChannel.socket.getInputStream().read() is called
3. When different threads make the calls
4. When the server side closes the socket sometime AFTER the first read returns.

I have been looking at the JDK source and have an understanding of what is happening.

a SocketChannel uses a SocketAdapter to emulate a Socket.  the getInputStream
method returns the SocketAdapter.SocketInputStream inner class.  the read(ByteBuffer)
method of SocketInputStream delegates to the channel UNLESS the timeout is set to nonzero.

In this case it uses a  temportary selector to block until the channel is readable.  This selector is stored in a SoftReference in ThreadLocal memory.   After the read is complete or the timer expires the SelectionKey assocatied with the Selelector and the Channel is cancelled in a finally block.

The causes two problem.

The first one is that even the the SelectionKey is cancelled, the selectNow method in not called on the selector.  So the channel stays registered until the same thread comes thru and does a similar read which causes the selector to be reused.  (That's why having a different thead each time aggravates this problem)  If the Selector is garbage collected before the next similar read by this thread then the select was never called and the channel remains registered until the Selector is closed.  In 1.4.1 the Selector had a finalizer that would close it and finally deregister the channel but the finilizer was removed 1.4.2 therefore these channels never get closed.  By adding the selectNow to the finally block here the channel will get deregistered immediately and the close of the channel should work as expected.

You can see the need to the selectNow even in 1.4.1 because the file descriptors don't get closed for a long time until the Selectors are finalized.  (There may be another bug on this)

The second problem that is caused here is that because the 1.4.2 finalizer was removed in the SelectorImpl the close method for these temporary Selectors never gets called.  This means that resources allocated by these Selectors never gets freed even if the object gets garbage collected which is definitely a problem.

I understand that the finalizers were removed because it caused slowness in the garbage collector so if it is desirable not to have finallizers here then a different mechanism to close these selectors when they get garbage collected needs to be created.  These unfinalized selectors are causing the accumulation of PIPEs (FIFOs) that you see in an lsof listing of the sample program.

Please feel free to contact me if any of this is unclear.  And I'm more than happy to help in any way I can.

Matt Brozowski

PS If you put the selectNow code in... there are 3 or 4 other places that you should consider putting it in as well as the code for those blocks is nearly Identical to the one in SocketinputStream.read


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1.  Paste the included source into the file TestClose.java
2. Modify the SMTP_HOSTNAME to be the name or address of a valid SMTP server that you can access
3. javac TestClose.java
4. java -classpath . TestClose

While the program is running run
    netstat -an

many times and take note of the number of sockets open and watch how they
change over time

Also run
   lsof -p <processid of java process>
if you have it and take note if it


EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The program only opens one socket at a time.  I expect the number of open file descriptors used by the program to remain very low (under 10)

ACTUAL -
Instead the number increases until the program begins to throw Too Many Open Files exceptions.

In 1.4.1 in would use them up until the finalizers ran.. then they would all be cleaned up at once.  This may have been the cause for slowness of the finalizers and the reason they were removed.

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.InetSocketAddress;
import java.net.Socket;
import java.nio.channels.SocketChannel;
import java.util.StringTokenizer;

// Test case code for file descriptor leak.
// The following should produce somewhere near 150 sockets in CLOSE_WAIT state.
//
// The problem appears to be in sun.nio.ch.SocketAdapter.SocketInputStream.read(ByteBuffer);
//

public class TestClose implements Runnable {
    
    public static final String SMTP_HOSTNAME = "myhandy.smtphost.com";

 	public void run()
	{
        InetSocketAddress sockAddr = new InetSocketAddress(SMTP_HOSTNAME, 25);
		SocketChannel sChannel = null;
		Socket socket = null;
		try
		{
			sChannel = SocketChannel.open();
			sChannel.connect(sockAddr);
            
            //  I've noticed that if I remove this line the problem no longer happens
			sChannel.socket().setSoTimeout(5000);
			
			socket = sChannel.socket();
			
			BufferedReader lineRdr = new BufferedReader(new InputStreamReader(socket.getInputStream()));
			
			
			String result = null;
			do
			{
                // before performing the first readline the channel is unregistered
				System.err.println("before first readline: isOpen = "+sChannel.isOpen()+" isRegistered="+sChannel.isRegistered());
                
				result = lineRdr.readLine();
				System.err.println("<- "+result);
                
                // after performing it is registered.
				System.err.println("after first readline: isOpen = "+sChannel.isOpen()+" isRegistered="+sChannel.isRegistered());
				
			} while(result != null && result.length() > 0 && result.matches("^[1-5][0-9]{2}-"));
			
			if(result == null || result.length() == 0)
			{
				System.err.println("Received truncated response from SMTP server " + sockAddr.getHostName());
				return;
			}
			
			// Tokenize the last line result
			//
			StringTokenizer t = new StringTokenizer(result);
			int rc = Integer.parseInt(t.nextToken());
			if(rc != 220) return;
			
			//
			// Send the QUIT command causing the server side to close its end of the connection
			//
			String cmd = "QUIT\r\n";
			socket.getOutputStream().write(cmd.getBytes());
			System.err.println("-> "+cmd);
			do
			{
				result = lineRdr.readLine();
				System.err.println("<- "+result);
			} while(result != null && result.length() > 0 && result.matches("^[1-5][0-9]{2}-"));
			
			if(result == null || result.length() == 0)
			{
				System.err.println("Received truncated response from SMTP server " + sockAddr.getHostName());
				return;
			}
			
		}
        catch (Exception e) {
            e.printStackTrace();
        }
		finally
		{
            try {
                
            	System.err.println("before close: isOpen = "+sChannel.isOpen()+" isRegistered="+sChannel.isRegistered());
			
            	System.err.println("Closing SMTP socket channel "+sChannel);
                System.err.println("channel.socket().isConnected = "+ sChannel.socket().isConnected());
            	if (sChannel != null) {
            		sChannel.close();
            		System.err.println("Closed SMTP socket channel "+sChannel);
                    
                    // The socket is still connected here.
            		System.err.println("channel.socket().isConnected = "+sChannel.socket().isConnected());
            	}
            }
            catch (Exception e) {
               System.err.println("Exception on close:");
               e.printStackTrace();
            }
		}
		
		return;
	}
    

	public static void main(String[] args) {
        TestClose test = new TestClose();
        for(int i = 0; i < 150; i++) {
         // this bug seems only to appear if different threads are reading the channels
        	Thread thread = new Thread(test);
            thread.start();
            try {thread.join(); } catch(InterruptedException e) {   }
        }
        
        System.err.println("Going to sleep.... run netstat -an | grep CLOSE_WAIT ");
        while(true) {
        	try { Thread.sleep(10000); } catch(InterruptedException e) {}
        }
	}
}



---------- END SOURCE ----------
###@###.### 2005-1-07 06:48:46 GMT

Comments
EVALUATION The latest sdn comment (3 oct) says "In a relatively idle system that repeatedly polls a deployment server for new artifacts, eventually I'll drain the avialable file descriptors for the process leaving the VM wedged and unable to respond to new requests or make further polling efforts. However, if I put the system under load such that full GCs are taking place, the FDs are cleaned during the full GC." Most likely this application creates a new thread everytime when need to "poll and process" the request and then throws away the thread later. Yes, such a use scenario will cause a temporary fd "leakage", before a GC kicks in. Part of fix of this bug is to have a "phantom-reference-based cleaner" to clean/release up the fds used by the app thread (mostly if the processing involves a "timeout" setting) when the thread is dead. So, yes, you need the GC to start up the cleanup. In such a scenario, a thread poll is strongly recommended. Not only the fd, the "thread" itself is also an expensive system resource, you might end up of exhausting the thread resource before you hid the fd limit. Let us know if this is not your case. And as mentioned above, it would be easy for us to follow up if you can leave your contact info (email address).
04-10-2006

EVALUATION -- Aug 18, 2006. A number of developers on the JDC have posted comments suggesting that his bug is not fixed in 5.0 update 7 on Linux. If you believe there is still a leak then please post a test case or leave a mail address so that we can follow up. So far all of the cases that we have examined have been cases where the closing of the file descriptor was delayed because the Selector is idle (meaning there aren't any events to wake it up). In those cases, the connection is closed but the file descriptor isn't released until the Selector wake ups.
18-08-2006

EVALUATION Note: The fix has been backported into 1.5u7.
09-02-2006

EVALUATION The same problems described in 5083450, 4726957 and 4960962. ###@###.### 2005-1-07 07:14:08 GMT
07-01-2005