Relates :
|
|
Relates :
|
|
Relates :
|
Jacobs Rimell is providing a solution for Comcast. They've run into some issues and want us to be aware of them and have made some suggestions for some changes that they feel would enhance the interface. Background Info ================ Jacobs Rimell (JR) have a product called APS. The current version (APS4) is a J2EE application that requires LDAP access to multiple LDAP directories from within an app-server. At Comcast there is only one Directory (the GDS), however some of this email reflects APS requirements as well as Comcast's requirements on APS. Given that Comcast contacted you originally I have deliberately indicated where the issue /current implementation has a direct impact on the customer. JR Issues with LDAP connection pooling ====================================== Originally we tried to use the inbuilt connection pooling in recent releases of Sun's LDAP provider. This had the advantage of near-transparency. First problem (JR only): configuration of poolsizes etc. is on a per-VM basis. The only choice available when making a connection to a specific directory is whether or not to pool connections. Since our product may need to work with several directories with radically different characteristics we decided to progress with this until we had a direct (new customer) requirement in order to expedite our product roadmap. Second problem (JR / Comcast): We implemented a simple round-robin load-balancing algorithm against multiple eTrust DSAs by leveraging the capability of Sun's LDAP SP to accept a list of URLs in its Context.PROVIDER_URL parameter. For each new InitialDirContext request we rotate the URLs so that each request goes to a different DSA. We believe that the connection pooling creates a pool based on the host/port of the URL, which means that with for example 3 urls (e.g. "ldap://a:389 ldap://a:10389 ldap://a:20389") we would end up with 3 connection pools. Third problem (JR / Comcast): Lost TCP packets on a customer network. The root cause was a misconfigured VPN, which prevented search results returning to the LDAP client. The effect was that the thread executing the LdapRequest in the application server hung waiting for a response until the TCP socket timed out after 2 hours. This quickly led to all execute threads being stuck. We reproduced this in our lab in London by heavily exercising a DSA (outside of APS) until its response to APS mimicked the misconfigured VPN, while these two scenarios are extreme APS is required to be always available and hence is required to survive operation issue Setting search timeouts using SearchControls didn't help as this mechanism relies on the directory server aborting the search request if it takes too long. Our problem was not the duration of the search itself but the non-arrival of the results. Ldap socket factories ===================== To avoid the stuck execute threads we decided we had to implement a timeout on LDAP sockets. The supported mechanism for this is to install a SocketFactory for LDAP requests, and within that SocketFactory to set the socket's SO_TIMEOUT. The name of the socket factory's class is used as a parameter during connection to the directory. This worked inasmuch as it caused the socket to fail when it waited too long for a response -- which we could then detect and apply a suitable retry strategy to -- but had other problems. Firstly (JR only): there is no easy way to configure different timeouts for different directories. Because the only parameter available is the name of the socket factory, it is necessary to implement a distinct physical class for each directory. Within the factory it seems that only the createSocket() method is used -- so we cannot even use the destination host/port to decide on the timeout to use. Secondly (JR / Comcast): the use of a custom LDAP socket factory explicitly disables sun's connection pooling! Thirdly (JR / Comcast): the timeout appears to cause the socket to fail when idle. We assume that the service provider is listening for unsolicited notifications and timing out. Once a socket fails the connection is no longer usable. Our Current solution ==================== We have stopped using the connection pooling in Sun's LDAP service provider and implemented our own context pool instead. We are still using the socket timeout mechanism. Our PoolableLdapContext implements DirContext and delegates requests to a real InitialDirContext. It identifies timeout-related failures of LDAP operations and retries an appropriate number of times. When idle it "pings" the directory by searching for a nonexistent entry to stop the idle socket from timing out. It recovers from failures by making a new InitialDirContext (because we found that after the socket times out it remains unusable.) JR's Wishlist ============= (1) Allow com.sun.jndi.ldap.connect.pool.* parameters passed in to the InitialDirContext environment to override the System Properties, so that each directory server can have its own pooling characteristics. (2) Make it easier to specify socket parameters. One way would be to make the InitialDirContext's environment available as a parameter to the createSocket calls. An alternative would be to pass an instance of the socket factory in the environment rather than just its name. (3) Don't disable connection pooling when a custom socket factory is used. (4) Provide a direct mechanism for specifying the timeout (without requiring a socket factory). (5) Is it necessary that unused sockets timeout? We are not explicitly using UnsolicitedNotifications, so why is there a read() on the socket in the first place? (6) Load balancing/round robining. Where a pooled context is created with multiple LDAP URLs in PROVIDER_URL, allow the choice of using them (a) as a sequence of fallback directories. (b) as a set of peer directory agents to be cycled through (round robin) (c) as a set of peer directory gents to be load balanced (i.e. pass requests to the one with the fewest outstanding requests). ###@###.### 10/11/04 07:18 GMT
|