United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-4946367 JDK 1.4.2 LDAP Library leaves worker threads orphaned
JDK-4946367 : JDK 1.4.2 LDAP Library leaves worker threads orphaned

Details
Type:
Bug
Submit Date:
2003-10-30
Status:
Resolved
Updated Date:
2004-03-30
Project Name:
JDK
Resolved Date:
2003-12-19
Component:
core-libs
OS:
linux_redhat_7.2,generic
Sub-Component:
javax.naming
CPU:
x86,generic
Priority:
P1
Resolution:
Fixed
Affected Versions:
1.4.2
Fixed Versions:
1.4.2_05 (05)

Related Reports
Backport:

Sub Tasks

Description
FULL PRODUCT VERSION :
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-b28) Java HotSpot(TM) Client VM (build 1.4.2-b28, mixed mode)

FULL OS VERSION :
Linux RedHat 7.2, Not necessarily limited to a particular OS.

EXTRA RELEVANT SYSTEM CONFIGURATION :
LDAP Server Timeout: 10 Minutes

A DESCRIPTION OF THE PROBLEM :
We are reporting an issue against the 1.4.2 JDK LDAP Libraries.  When we run our test script that does 200 near-simultaneous LDAP connections, we see that about 40 threads never exit, and around 20 sockets never get closed. This pattern happens whether or not the LDAP connections are made over SSL, but the problem is significantly worse when connecting over SSL. The second batch of threads appears to be hitting the LDAP server's connection timeout limit (10 minutes), based on the error output in the logs for the bad threads.  We're not sure why those 40 threads are taking so long to process.  Then, when the LDAP server's socket endpoint closes, apparently the JDK's LDAP library leaves the worker thread orphaned and the socket partially open.


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
  To validate that the issue is in fact with the JDK LDAP library and not with our application, we have written a LdapTest class that follows our authentication logic but doesn't use any of our application code.

There are three files attached to this ticket:
- sample authentication.properties
- LdapTest.java
- compile-and-run-ldaptest.sh

These were the test steps we implemented:
- edit authentication.properties file to point to an LDAP server URL, privileged username & password, etc.
- edit compile-and-run-testldap.sh to point JAVA_HOME at the 1.4.2 JDK
- run compile-and-run-testldap.sh
-- all threads start
-- first batch of threads stop
-- write down last thread number for first batch
-- wait 10 minutes
-- second batch of threads stop
-- capture ldaptest.out.NUM output for second bunch of threads


EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Generally I would expect the LDAP service provider to handle the case when the LDAP server closes the connection by releasing the worker thread and the socket. ACTUAL - The LDAP service provider is not releasing the worker thread or socket.

ERROR MESSAGES/STACK TRACES THAT OCCUR :
(1) java.net.SocketException: Connection reset

Occurs 24 times in the log.  The "Connection reset" error in the tomcat-jvm-stderr.txt log may correspond to the "Resource not available" error in the LDAP error log.

I'm wondering if this error is caused by the connection-timeout configuration for  their LDAP server.

javax.naming.CommunicationException: simple bind failed: neuldap2.neu.edu:636 [Root exception is java.net.SocketException: Connection reset]
        at com.sun.jndi.ldap.LdapClient.authenticate(LdapClient.java:198)
        at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2640)
        at com.sun.jndi.ldap.LdapCtx.<init>(LdapCtx.java:290)
        at com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:175)
        at com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:193)
        at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:136)
        at com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:66)
        at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:662)
        at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:243)
        at javax.naming.InitialContext.init(InitialContext.java:219)
        at javax.naming.InitialContext.<init>(InitialContext.java:195)
        at javax.naming.directory.InitialDirContext.<init>(InitialDirContext.java:80)
        at blackboard.platform.security.authentication.LDAPAuthContext.bindWithUser(LDAPAuthContext.java:110)
        at blackboard.platform.security.authentication.LDAPAuthModule.authenticate(LDAPAuthModule.java:168)
        at blackboard.platform.security.authentication.BaseAuthenticationModule.doAuthenticate(BaseAuthenticationModule.java:352)
        at blackboard.platform.security.authentication.BaseAuthenticationModule.doAuthenticate(BaseAuthenticationModule.java:322)
        at blackboard.platform.security.authentication.HttpAuthManager.validateSession(HttpAuthManager.java:260)
        at blackboard.platform.security.authentication.servlet.LoginBrokerServlet.doLogin(LoginBrokerServlet.java:245)
        at blackboard.platform.security.authentication.servlet.LoginBrokerServlet.doLogin(LoginBrokerServlet.java:204)
        at blackboard.platform.security.authentication.servlet.LoginBrokerServlet.service(LoginBrokerServlet.java:143)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java)
        at org.apache.tomcat.facade.ServletHandler.doService(ServletHandler.java:574)
        at org.apache.tomcat.core.Handler.invoke(Handler.java:322)
        at org.apache.tomcat.core.Handler.service(Handler.java:235)
        at org.apache.tomcat.facade.ServletHandler.service(ServletHandler.java:485)
        at org.apache.tomcat.core.ContextManager.internalService(ContextManager.java:917)
        at org.apache.tomcat.core.ContextManager.service(ContextManager.java:833)
        at org.apache.tomcat.modules.server.Ajp12Interceptor.processConnection(Ajp12Interceptor.java:221)
        at org.apache.tomcat.util.net.TcpWorkerThread.runIt(PoolTcpEndpoint.java:494)
        at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:516)
        at java.lang.Thread.run(Thread.java:534)
Caused by: java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:168)
        at com.sun.net.ssl.internal.ssl.InputRecord.a(DashoA6275)
        at com.sun.net.ssl.internal.ssl.InputRecord.read(DashoA6275)
        at com.sun.net.ssl.internal.ssl.SSLSocketImpl.a(DashoA6275)
        at com.sun.net.ssl.internal.ssl.SSLSocketImpl.j(DashoA6275)
        at com.sun.net.ssl.internal.ssl.SSLSocketImpl.a(DashoA6275)
        at com.sun.net.ssl.internal.ssl.AppOutputStream.write(DashoA6275)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:66)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:124)
        at com.sun.jndi.ldap.Connection.writeRequest(Connection.java:390)
        at com.sun.jndi.ldap.LdapClient.ldapBind(LdapClient.java:334)
        at com.sun.jndi.ldap.LdapClient.authenticate(LdapClient.java:193)
        ... 30 more


(2) javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake

Occurs 3 times in the log. 

                                    

Comments
EVALUATION

First of all, the sample test application is not cleaning-up the resources at every exit point.
The method authenticateUsingContext() returns without closing the contexts/NamingEnumeration at certain return points.

With this correction made to the sample test application, there are still connections that are left open after full clean up by the application.
To isolate the problem, I ran the test by disabling the connection pooling, and all the connections are closed properly.

I will continue investigation with cleaning up of resources when connection pooling is enabled.
 
Also, my testing has been with SSL enabled, I haven't tried with plain connection.

###@###.### 2003-11-02

----------------------------------

There seems to be a problem with the cleaning up of pooled
resources in the LDAP provider. This problem is intermittant 
and mostly won't be seen when the number of connections are fewer.
Another factor that has a direct influence on it
is the idle timeout of pooled connections (set through com.sun.jndi.ldap.connect.pool.timeout system property). If the idle
timeout is larger (in the sample test program, it's 5 minutes),
the problem is likely to occur. 

The large number of connections and the longer idle timeout for
connections leads to accumulation of resource forcing the
GC to reclaim the resources, which affects the way the
cleaning is done by the LDAP provider pooling mechanism (ideally it shouldn't,
and that is the fix to be added).

Using a smaller idle timeout (might) help in reducing the
number of open connections. However, it may not totally eradicate
the problem. A smaller idle timeout enables frequent closing
out of the idle connections that have been idle beyond the threshold
time limit, there by keeping the overall resources under control.

The fix for this will be provided in tiger release. For the fix to be made available in 1.4.2, the bug needs to be escalated.

###@###.### 2003-11-04
                                     
2003-11-04
WORK AROUND

There is no complete workaround. Using a smaller connection idle timeout (com.sun.jndi.ldap.connect.pool.timeout) might help. The sample test application provided here uses 5 minutes, which I thinl is fairly large.

                                     
2004-09-02
CONVERTED DATA

BugTraq+ Release Management Values

COMMIT TO FIX:
1.4.2_05
generic
tiger-beta

FIXED IN:
1.4.2_05
tiger-beta

INTEGRATED IN:
1.4.2_05
tiger-b32
tiger-beta


                                     
2004-09-02



Hardware and Software, Engineered to Work Together