I have discovered a problem in sun.rmi.transport.DGCClient while running on our platform. I encountered this problem using Java 1.3.1, but the code is identical in Java 1.4.2, 5.0, and 6.0, so I'm sure the problem exists there as well. We are running a first copy of JBoss application server. It uses RMI to connect to a second copy of JBoss application server. If the second copy of JBoss terminates abruptly, the first copy of JBoss starts attempting to reconnect to the second copy using the RMI port. These reconnection attempts are very rapid fire, in a tight loop. After analyzing the problem, I determined that the problem is in sun.rmi.transport.DGCClient. In particular, the method makeDirtyCall is called periodically to renew the lease on remote objects so they are not garbage collected in the remote JVM. If the socket "connect" to the RMI port fails, the code in makeDirtyCall goes into a reconnect algorithm. The time to wait between retries is computed by taking the difference between the timestamps before and after the last connect attempt. Since the granularity of System.currentTimeMillis() may be less than milliseconds, it is not at all unusual for this difference to be zero. Therefore, zero is used to compute the wait time. This wait time is doubled after each retry, but of course doubling zero doesn't do much good! These retries occur until the lease time is up. The typical lease time is 10 minutes, so the tight loop can continue for quite a while. I would like to suggest the following fix: [see Suggested Fix] ###@###.### 2005-2-22 03:14:44 GMT
|