JDK-4946367 : JDK 1.4.2 LDAP Library leaves worker threads orphaned
  • Type: Bug
  • Component: core-libs
  • Sub-Component: javax.naming
  • Affected Version: 1.4.2
  • Priority: P1
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic,linux_redhat_7.2
  • CPU: generic,x86
  • Submitted: 2003-10-30
  • Updated: 2004-03-30
  • Resolved: 2003-12-19
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
1.4.2_05 05Fixed
Description
FULL PRODUCT VERSION :
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-b28) Java HotSpot(TM) Client VM (build 1.4.2-b28, mixed mode)

FULL OS VERSION :
Linux RedHat 7.2, Not necessarily limited to a particular OS.

EXTRA RELEVANT SYSTEM CONFIGURATION :
LDAP Server Timeout: 10 Minutes

A DESCRIPTION OF THE PROBLEM :
We are reporting an issue against the 1.4.2 JDK LDAP Libraries.  When we run our test script that does 200 near-simultaneous LDAP connections, we see that about 40 threads never exit, and around 20 sockets never get closed. This pattern happens whether or not the LDAP connections are made over SSL, but the problem is significantly worse when connecting over SSL. The second batch of threads appears to be hitting the LDAP server's connection timeout limit (10 minutes), based on the error output in the logs for the bad threads.  We're not sure why those 40 threads are taking so long to process.  Then, when the LDAP server's socket endpoint closes, apparently the JDK's LDAP library leaves the worker thread orphaned and the socket partially open.


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
  To validate that the issue is in fact with the JDK LDAP library and not with our application, we have written a LdapTest class that follows our authentication logic but doesn't use any of our application code.

There are three files attached to this ticket:
- sample authentication.properties
- LdapTest.java
- compile-and-run-ldaptest.sh

These were the test steps we implemented:
- edit authentication.properties file to point to an LDAP server URL, privileged username & password, etc.
- edit compile-and-run-testldap.sh to point JAVA_HOME at the 1.4.2 JDK
- run compile-and-run-testldap.sh
-- all threads start
-- first batch of threads stop
-- write down last thread number for first batch
-- wait 10 minutes
-- second batch of threads stop
-- capture ldaptest.out.NUM output for second bunch of threads


EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Generally I would expect the LDAP service provider to handle the case when the LDAP server closes the connection by releasing the worker thread and the socket. ACTUAL - The LDAP service provider is not releasing the worker thread or socket.

ERROR MESSAGES/STACK TRACES THAT OCCUR :
(1) java.net.SocketException: Connection reset

Occurs 24 times in the log.  The "Connection reset" error in the tomcat-jvm-stderr.txt log may correspond to the "Resource not available" error in the LDAP error log.

I'm wondering if this error is caused by the connection-timeout configuration for  their LDAP server.

javax.naming.CommunicationException: simple bind failed: neuldap2.neu.edu:636 [Root exception is java.net.SocketException: Connection reset]
        at com.sun.jndi.ldap.LdapClient.authenticate(LdapClient.java:198)
        at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2640)
        at com.sun.jndi.ldap.LdapCtx.<init>(LdapCtx.java:290)
        at com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:175)
        at com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:193)
        at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:136)
        at com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:66)
        at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:662)
        at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:243)
        at javax.naming.InitialContext.init(InitialContext.java:219)
        at javax.naming.InitialContext.<init>(InitialContext.java:195)
        at javax.naming.directory.InitialDirContext.<init>(InitialDirContext.java:80)
        at blackboard.platform.security.authentication.LDAPAuthContext.bindWithUser(LDAPAuthContext.java:110)
        at blackboard.platform.security.authentication.LDAPAuthModule.authenticate(LDAPAuthModule.java:168)
        at blackboard.platform.security.authentication.BaseAuthenticationModule.doAuthenticate(BaseAuthenticationModule.java:352)
        at blackboard.platform.security.authentication.BaseAuthenticationModule.doAuthenticate(BaseAuthenticationModule.java:322)
        at blackboard.platform.security.authentication.HttpAuthManager.validateSession(HttpAuthManager.java:260)
        at blackboard.platform.security.authentication.servlet.LoginBrokerServlet.doLogin(LoginBrokerServlet.java:245)
        at blackboard.platform.security.authentication.servlet.LoginBrokerServlet.doLogin(LoginBrokerServlet.java:204)
        at blackboard.platform.security.authentication.servlet.LoginBrokerServlet.service(LoginBrokerServlet.java:143)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java)
        at org.apache.tomcat.facade.ServletHandler.doService(ServletHandler.java:574)
        at org.apache.tomcat.core.Handler.invoke(Handler.java:322)
        at org.apache.tomcat.core.Handler.service(Handler.java:235)
        at org.apache.tomcat.facade.ServletHandler.service(ServletHandler.java:485)
        at org.apache.tomcat.core.ContextManager.internalService(ContextManager.java:917)
        at org.apache.tomcat.core.ContextManager.service(ContextManager.java:833)
        at org.apache.tomcat.modules.server.Ajp12Interceptor.processConnection(Ajp12Interceptor.java:221)
        at org.apache.tomcat.util.net.TcpWorkerThread.runIt(PoolTcpEndpoint.java:494)
        at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:516)
        at java.lang.Thread.run(Thread.java:534)
Caused by: java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:168)
        at com.sun.net.ssl.internal.ssl.InputRecord.a(DashoA6275)
        at com.sun.net.ssl.internal.ssl.InputRecord.read(DashoA6275)
        at com.sun.net.ssl.internal.ssl.SSLSocketImpl.a(DashoA6275)
        at com.sun.net.ssl.internal.ssl.SSLSocketImpl.j(DashoA6275)
        at com.sun.net.ssl.internal.ssl.SSLSocketImpl.a(DashoA6275)
        at com.sun.net.ssl.internal.ssl.AppOutputStream.write(DashoA6275)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:66)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:124)
        at com.sun.jndi.ldap.Connection.writeRequest(Connection.java:390)
        at com.sun.jndi.ldap.LdapClient.ldapBind(LdapClient.java:334)
        at com.sun.jndi.ldap.LdapClient.authenticate(LdapClient.java:193)
        ... 30 more


(2) javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake

Occurs 3 times in the log. 

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: 1.4.2_05 generic tiger-beta FIXED IN: 1.4.2_05 tiger-beta INTEGRATED IN: 1.4.2_05 tiger-b32 tiger-beta
02-09-2004

WORK AROUND There is no complete workaround. Using a smaller connection idle timeout (com.sun.jndi.ldap.connect.pool.timeout) might help. The sample test application provided here uses 5 minutes, which I thinl is fairly large.
02-09-2004

EVALUATION First of all, the sample test application is not cleaning-up the resources at every exit point. The method authenticateUsingContext() returns without closing the contexts/NamingEnumeration at certain return points. With this correction made to the sample test application, there are still connections that are left open after full clean up by the application. To isolate the problem, I ran the test by disabling the connection pooling, and all the connections are closed properly. I will continue investigation with cleaning up of resources when connection pooling is enabled. Also, my testing has been with SSL enabled, I haven't tried with plain connection. ###@###.### 2003-11-02 ---------------------------------- There seems to be a problem with the cleaning up of pooled resources in the LDAP provider. This problem is intermittant and mostly won't be seen when the number of connections are fewer. Another factor that has a direct influence on it is the idle timeout of pooled connections (set through com.sun.jndi.ldap.connect.pool.timeout system property). If the idle timeout is larger (in the sample test program, it's 5 minutes), the problem is likely to occur. The large number of connections and the longer idle timeout for connections leads to accumulation of resource forcing the GC to reclaim the resources, which affects the way the cleaning is done by the LDAP provider pooling mechanism (ideally it shouldn't, and that is the fix to be added). Using a smaller idle timeout (might) help in reducing the number of open connections. However, it may not totally eradicate the problem. A smaller idle timeout enables frequent closing out of the idle connections that have been idle beyond the threshold time limit, there by keeping the overall resources under control. The fix for this will be provided in tiger release. For the fix to be made available in 1.4.2, the bug needs to be escalated. ###@###.### 2003-11-04
04-11-2003