JDK-4359598 : NullPointerException in com.sun.jndi.ldap.Connection.run(Connection.java:567)
  • Type: Bug
  • Component: core-libs
  • Sub-Component: javax.naming
  • Affected Version: 1.2.2
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2000-08-04
  • Updated: 2001-03-08
  • Resolved: 2000-08-25
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other Other
1.3.1 ladybirdFixed 1.4.0Fixed
Related Reports
Relates :  
Description
When running JNDI on a multi-processor machine, we get intermittent instances
of the following exception:
  java.lang.NullPointerException
        at com.sun.jndi.ldap.Connection.run(Connection.java:567)
        at java.lang.Thread.run(Thread.java:484)
This exception is not trapped by our try/catch block.

Once the exception has occurred, we get one instance of the following exception
on a call to InitialDirContext.getAttributes.
javax.naming.ServiceUnavailableException: roonadan.eng.sun.com:389; remaining name 'OU=prjC, OU=divA, OU=Comp, O=sun, C=us'
        at com.sun.jndi.ldap.Connection.readReply(Connection.java:221)
        at com.sun.jndi.ldap.Connection.readReply(Connection.java:243)
        at com.sun.jndi.ldap.LdapClient.getSearchReply(LdapClient.java:539)
        at com.sun.jndi.ldap.LdapClient.search(LdapClient.java:503)
        at com.sun.jndi.ldap.LdapCtx.doSearch(LdapCtx.java:1720)
        at com.sun.jndi.ldap.LdapCtx.doSearchOnce(LdapCtx.java:1670)
        at com.sun.jndi.ldap.LdapCtx.c_getAttributes(LdapCtx.java:1074)
        at com.sun.jndi.toolkit.ctx.ComponentDirContext.p_getAttributes(ComponentDirContext.java:216)
        at com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.getAttributes(PartialCompositeDirContext.java:124)
        at com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.getAttributes(PartialCompositeDirContext.java:112)
        at javax.naming.directory.InitialDirContext.getAttributes(InitialDirContext.java:124)
        at DirAccessTest2.runTest(DirAccessTest2.java:67)
        at DirAccessTest2.main(DirAccessTest2.java:31)

If we continue calls to InitialDirContext.getAttributes following this
exception we get

javax.naming.CommunicationException: Socket closed.  Root exception is java.net.SocketException: Socket closed
        at java.net.SocketOutputStream.socketWrite(Native Method)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:83)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:72)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:116)
        at com.sun.jndi.ldap.Connection.writeRequest(Connection.java:209)
        at com.sun.jndi.ldap.LdapClient.search(LdapClient.java:497)
        at com.sun.jndi.ldap.LdapCtx.doSearch(LdapCtx.java:1720)
        at com.sun.jndi.ldap.LdapCtx.doSearchOnce(LdapCtx.java:1670)
        at com.sun.jndi.ldap.LdapCtx.c_getAttributes(LdapCtx.java:1074)
        at com.sun.jndi.toolkit.ctx.ComponentDirContext.p_getAttributes(ComponentDirContext.java:216)
        at com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.getAttributes(PartialCompositeDirContext.java:124)
        at com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.getAttributes(PartialCompositeDirContext.java:112)
        at javax.naming.directory.InitialDirContext.getAttributes(InitialDirContext.java:124)
        at DirAccessTest2.runTest(DirAccessTest2.java:59)
        at DirAccessTest2.main(DirAccessTest2.java:27)

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: ladybird merlin-beta FIXED IN: ladybird merlin-beta INTEGRATED IN: ladybird merlin-beta
02-09-2004

WORK AROUND If, after getting a javax.naming.ServiceUnavailableException on an InitialDirContext.getAttributes call, we create a new InitialDirContext, subsequent getAttributes calls do not fail (at least until the next NullPointException and ServiceUnavailableException. This recovery does not fit very easily into typical program flow, however.
02-09-2004

SUGGESTED FIX Affected files are: Connection LdapRequest LdapClient Fix description: The simple fix is to make sure that the LdapRequest does not change between the time the Connection thread finds the request and when it attempts to notify the (possibly blocked) reader. For example, in Connection.run(), add the 'synchronized (this)' block. synchronized (this) { LdapRequest ldr = findRequest(inMsgId); if (ldr != null) { ldr.addReplyBer(retBer); synchronized (ldr.ber) { ldr.ber.notify(); } } else { // System.err.println("Cannot find LdapRequest for " + inMsgId); } } This does indeed fix the problem but it is unsatisfactory for a few reasons. 1. Related to this is another sync problem in readReply(). BerDecoder readReply(int msgId, BerEncoder requestBer) throws IOException, NamingException { BerDecoder rber; while ((rber = readReply(msgId)) == null) { try { synchronized (requestBer) { requestBer.wait(15 * 1000); // 15 second timeout } } catch (InterruptedException ex) { throw new InterruptedNamingException( "Interrupted during LDAP operation"); } } return rber; } Novera had reported a "spiking" effect in their performance tests, indicating occasional 15 second pauses, and provided strong evidence that this method was the culprit: They changed the 15 to 2 and the spike moved from 15 sec to 2 sec. 2. Using LdapRequest.ber (the outgoing BER request buffer) as a lock is not clean. It makes the code convoluted and forces the VM keep the BER buffer longer than it needs to. 3. Although the pendingRequests queue is probably usually very short, the constant use of msgId to find the request from it was inefficient. The bigger fix is to remove the dependency on the outgoing BER buffer (and msgId) and make the dependency be on LdapRequest. LdapRequest: - Removed the ber field. No need to keep it around because we're not using it as a lock anymore. - Added a boolean 'cancelled' field that gets marked when the LdapRequest is removed. Provides a quick check for getReplyBer() to use to prevent waiting indefinitely for an abandoned/cancelled request. (see Connection.readReply()). - Added notify() call to addReply() and also made it not do anything if the request has already been cancelled. - Added a constructor to make the code cleaner. msgId is still protected so that it can be read (with no "get" method) but it can now be treated as a read-only field. Connection: - writeRequest() returns LdapRequest so that the caller of writeRequest() can use it as a lock to wait for the reply - readReply() now accepts LdapRequest instead of msgId and outgoing BER buffer. readReply() used to use msgId to find the LdapRequest and if it has no replies yet, wait for 15 sec before looking again. In the updated version, look in LdapRequest directly for the reply and if none is found, wait for 15 sec before looking again. The spiking problem arose because once we grabbed the lock, we immediately did a wait. If the reply arrived already, we would not have known and would still wait (because we missed the notify). To fix the spiking problem, after grabbing the lock, we check that the request's replies. If there is a reply, we return it immediately; otherwise, we wait. We could not have done this had we kept using the outgoing BER bufer as the lock because it would lead to intermittent deadlocks (to find the request once inside the sync block is problemmatic). readReply() leaves the wait when one of the following occurs. - 15 secs up - Connection read a reply for that request and invoked notify() - The connection is closed (either by Connection or another thread) and the request is dequeued from pendingRequest - Someone abandoned the request After the wait, we make 3 checks: - check if reply is available - if so, return - check whether 'sock' has been closed, if not wait - check whether request has been abandoned or dequeued, if not wait (check is made inside LdapRequest.getReplyBer()) If we don't do these last two checks, then the reader might block forever, even if the underlying connection has already been closed by the server (the Connection would have noticed and closed 'sock'). - removed private readReply() utility method; checking for null 'sock' now happens inside main readReply() - removeRequest() identifies item to be removed by using LdapRequest instead of msgId. Use ldr.cancel() instead of explicit synchronized block/notify(). No need to free ldr.ber. - abandonRequest() identifies item to be removed by using LdapRequest instead of msgId. Moved removeRequest() earlier. Synchronize only access to outStream instead of whole method. - ldapUnbind() synchronizes only access to outStream instead of whole method. - run() calls ldr.addReply(), which does the notify already. LdapClient: - All calls to conn.writeRequest() now obtains a 'req' and then passes that to conn.readReply() to fetch the result. - Most calls to conn.removeRequest() uses an LdapRequest instead of msgId. Similar changes to getSearchReply(). - In clearSearchReply(), do a findRequest() first using msgId, and the pass the resulting LdapRequest to removeRequest() or abandonRequest(). OK if LdapRequest changes before remove/abandon because these calls are idempotent.
02-09-2004

PUBLIC COMMENTS While using JNDI with LDAP on a multi-processor SPARC machine, we occasionally get a NullPointerException at "com.sun.jndi.ldap.Connection.run". After this exception occurs, the next call to InitialDirContext.getAttributes fails with "javax.naming.ServiceUnavailableException" If we continue to make InitialDirContext.getAttributes calls with the same InitialDirContext, we then get "javax.naming.CommunicationException: Socket closed. Root exception is java.net.SocketException: Socket closed" If, after getting the ServiceUnavailableException, we create a new InitialDirContext before making the next getAttributes call, the calls work (at least until the next NullPointerException. 10 Jan 2001, kevin.ryan@eng -- added "jdcinclude" keyword, based on an external user needing to see this report.
02-09-2004

EVALUATION There is a race condition between the Connection thread and the main thread. The Connection thread fetches the request from the pendingRequests queue and tries to notify the reader (main thread). However, in the meantime, the main thread might have already removed/dequeued the request. If that happens before the Connection thread notifies, the Connection thread will get a null pointer exception. The fix is to make the two steps (checking the pendingRequests queue and notifying) atomic. rosanna.lee@eng 2000-08-04
04-08-2000