JDK-4229801 : (1.1.x) RMI: a stopped client causes the server to stop handling other clients
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.rmi
  • Affected Version: 1.1.6_007,1.1.7,1.1.8
  • Priority: P1
  • Status: Closed
  • Resolution: Won't Fix
  • OS: generic,solaris_2.6,solaris_7
  • CPU: generic,sparc
  • Submitted: 1999-04-14
  • Updated: 2001-01-09
  • Resolved: 2001-01-09
Related Reports
Duplicate :  
Duplicate :  
Description

Name: mf23781			Date: 04/14/99


This problem had been previously addressed by SUN bugid 4137568
and fixed in JDK 1.2. We would like this to be fixed in JDK1.1.7

Description of the problem:
===========================


I have 2 remote objects (call them R1 and R2) each of which extend UnicastRemoteObject and each
of which implement an interface that extends Remote.  R1 is started and binds itself in the registry.
R2 is started, looks up R1 in the registry, and sends a reference to itself to R1.  R1 stores the reference
to R2, but does not try to contact R2.  Once R2 has successfully sent a reference of itself to R1, I do a
CTRL-Z on R2 and leave R2 in that suspended state.  Shortly after I do that CTRL-Z, new remote objects
(of the same type as R2) are no longer able to contact R1.  I don't receive any exceptions, my program
that is trying to contact R1 simply hangs.

Comments by Peter Jones - JavaSoft East <###@###.###>
====================================================================

Date:         Thu, 4 Mar 1999 09:44:11 -0500
Reply-To:     Peter Jones - JavaSoft East <###@###.###>
From:         Peter Jones - JavaSoft East <###@###.###>
Subject:      Re: Hang condition when an RMI object is suspended
======================================================================

The following thread dump from 1.1.7B java shows
the problem. The first TestObject process is suspended.
10 minutes later the thread dump is obtained from the RMI server process
"HelloServer" after a 2nd TestObject has handed a copy of itself 
to the HelloServer and the HelloServer is trying to ack the copy by 
making a call to DGCClient.referenced() (TCP Accept-6 thread) 
but hangs there since the lock is already acquired by LeaseRenewer thread
which is hanging because the first TestObject process is not responding.

Full thread dump:
    "TCP Accept-8" (TID:0xee305e98, sys_thread_t:0xef0d1db8, state:CW) prio=5
	java.net.PlainSocketImpl.accept(PlainSocketImpl.java:379)
	java.net.ServerSocket.implAccept(ServerSocket.java:198)
	java.net.ServerSocket.accept(ServerSocket.java:181)
	sun.rmi.transport.proxy.HttpAwareServerSocket.accept(HttpAwareServerSocket.java:70)
	sun.rmi.transport.tcp.TCPTransport.run(TCPTransport.java:358)
	java.lang.Thread.run(Thread.java)
    "TCP Accept-6" (TID:0xee305400, sys_thread_t:0xef101db8, state:MW) prio=5
	sun.rmi.transport.DGCClient.referenced(DGCClient.java:118)
	sun.rmi.transport.ConnectionInputStream.registerRefs(ConnectionInputStream.java:101)
	sun.rmi.transport.StreamRemoteCall.releaseInputStream(StreamRemoteCall.java:137)
	HelloImpl_Skel.dispatch(HelloImpl_Skel.java:37)
	sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:164)
	sun.rmi.transport.Transport.serviceCall(Transport.java:154)
	sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:600)
	sun.rmi.transport.tcp.TCPTransport.run(TCPTransport.java:506)
	java.lang.Thread.run(Thread.java)
    "Cleaner" (TID:0xee305250, sys_thread_t:0xef131db8, state:CW) prio=5
	sun.rmi.transport.DGCClient.run(DGCClient.java:621)
	java.lang.Thread.run(Thread.java)
    "LeaseRenewer" (TID:0xee305168, sys_thread_t:0xef161db8, state:CW) prio=5
	java.net.SocketInputStream.read(SocketInputStream.java:84)
	java.io.BufferedInputStream.fill(BufferedInputStream.java)
	java.io.BufferedInputStream.read(BufferedInputStream.java)
	java.io.DataInputStream.readByte(DataInputStream.java)
	sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:140)
	sun.rmi.server.UnicastRef.newCall(UnicastRef.java:67)
	sun.rmi.transport.DGCImpl_Stub.dirty(DGCImpl_Stub.java:58)
	sun.rmi.transport.DGCClient.renewLeases(DGCClient.java:773)
	sun.rmi.transport.DGCClient.doRenewal(DGCClient.java:705)
	sun.rmi.transport.DGCClient$LeaseRenewer.run(DGCClient.java:936)
	java.lang.Thread.run(Thread.java)
    "LeaseChecker" (TID:0xee300290, sys_thread_t:0xef191db8, state:CW) prio=5
	sun.rmi.transport.DGCImpl$LeaseChecker.run(DGCImpl.java:303)
	java.lang.Thread.run(Thread.java)
    "TCP Accept-2" (TID:0xee305ec8, sys_thread_t:0xef1c1db8, state:CW) prio=5
	java.net.SocketInputStream.read(SocketInputStream.java:84)
	java.io.BufferedInputStream.fill(BufferedInputStream.java)
	java.io.BufferedInputStream.read(BufferedInputStream.java)
	java.io.BufferedInputStream.fill(BufferedInputStream.java)
	java.io.BufferedInputStream.read(BufferedInputStream.java)
	java.io.DataInputStream.readByte(DataInputStream.java)
	sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:589)
	sun.rmi.transport.tcp.TCPTransport.run(TCPTransport.java:506)
	java.lang.Thread.run(Thread.java)
    "KeepAlive" (TID:0xee304e40, sys_thread_t:0xef241db8, state:CW) prio=5
	sun.rmi.transport.KeepAlive.run(ObjectTable.java:182)
	java.lang.Thread.run(Thread.java)
    "Reaper" (TID:0xee304ea0, sys_thread_t:0xef271db8, state:CW) prio=5
	sun.rmi.transport.Reaper.run(ObjectTable.java:199)
	java.lang.Thread.run(Thread.java)
    "Finalizer thread" (TID:0xee300210, sys_thread_t:0xef341db8, state:CW) prio=1
    "Async Garbage Collector" (TID:0xee300258, sys_thread_t:0xef371db8, state:CW) prio=1
    "Idle thread" (TID:0xee3002a0, sys_thread_t:0xef471db8, state:R) prio=0 *current thread*
    "Clock" (TID:0xee300088, sys_thread_t:0xef561db8, state:CW) prio=12
    "main" (TID:0xee3000b0, sys_thread_t:0x2eaf8, state:CW) prio=5
Monitor Cache Dump:
    java.net.PlainSocketImpl@EE304DB0/EE3587C8: owner "TCP Accept-8" (0xef0d1db8, 1 entry)
    <unknown key> (0xef241db8): <unowned>
	Waiting to be notified:
	    "KeepAlive" (0xef241db8)
    java.io.BufferedInputStream@EE3059F8/EE35BE88: owner "LeaseRenewer" (0xef161db8, 1 entry)
    java.io.BufferedInputStream@EE305DD0/EE358728: owner "TCP Accept-2" (0xef1c1db8, 1 entry)
    <unknown key> (0xef191db8): <unowned>
	Waiting to be notified:
	    "LeaseChecker" (0xef191db8)
    <unknown key> (0xef371db8): <unowned>
	Waiting to be notified:
	    "Async Garbage Collector" (0xef371db8)
    <unknown key> (0xef271db8): <unowned>
	Waiting to be notified:
	    "Reaper" (0xef271db8)
    java.lang.Object@EE305030/EE358E88: owner "LeaseRenewer" (0xef161db8, 1 entry)
	Waiting to enter:
	    "TCP Accept-6" (0xef101db8)
    java.util.Vector@EE305068/EE359040: <unowned>
	Waiting to be notified:
	    "Cleaner" (0xef131db8)
    java.io.BufferedInputStream@EE303628/EE358A68: owner "TCP Accept-2" (0xef1c1db8, 1 entry)
Registered Monitor Dump:
    Thread queue lock: <unowned>
	Waiting to be notified:
	    "main" (0x2eaf8)
    Name and type hash table lock: <unowned>
    String intern lock: <unowned>
    JNI pinning lock: <unowned>
    JNI global reference lock: <unowned>
    BinClass lock: <unowned>
    Class loading lock: <unowned>
    Java stack lock: <unowned>
    Code rewrite lock: <unowned>
    Heap lock: <unowned>
    Has finalization queue lock: <unowned>
    Finalize me queue lock: <unowned>
	Waiting to be notified:
	    "Finalizer thread" (0xef341db8)
    Dynamic loading lock: <unowned>
    Monitor IO lock: <unowned>
    Child death monitor: <unowned>
    Event monitor: <unowned>
    I/O monitor: <unowned>
    Alarm monitor: <unowned>
	Waiting to be notified:
	    "Clock" (0xef561db8)
    Sbrk lock: <unowned>
    Monitor registry: owner "Idle thread" (0xef471db8, 1 entry)
Thread Alarm Q:
    sys_thread_t 0xef371db8   [Timeout in 923 ms]
    sys_thread_t 0xef271db8   [Timeout in 2794 ms]
    sys_thread_t 0xef131db8   [Timeout in 49594 ms]
    sys_thread_t 0xef191db8   [Timeout in 199197 ms]
    sys_thread_t 0xef241db8   [Timeout in 2146481018 ms]


patrick.ong@Eng 1999-05-10

Comments
WORK AROUND Name: mf23781 Date: 04/14/99 We have a sugested fix for evaluation. The diffs for the fix and a pre-prepared "jar" file with the testcase will be emailed to ###@###.### ======================================================================
11-06-2004

EVALUATION This bug, which is that the client-side DGC synchronization problem of 4137568 has not been fixed in a 1.1.x release, has already been filed as 4226268. Also, this same problem also appears to be part of the cause for the recently-filed 4228651, which was also submitted by IBM and escalated. Patrick.Ong@eng is currently investigating 4228651, including possibly backporting the 1.2 fix for the client-side DGC synchronization problem to 1.1.x. It seems like we should mark one or two of these bugids as duplicates of each other, but I'll put that off for a bit until we get a little more information. At any rate, because several users who cannot upgrade to JDK 1.2 yet have been running into this bug recently, 4226268 is currently committed to be fixed in "dino", which looks to be 1.1.9. peter.jones@East 1999-04-15 Based on the thread dump in the description, this testcase definitely shows the same bug as 4226268. I'm developing a fix for 1.1.7B and 1.1.8 patches. As Peter stated, the 1.1.7B fix should try to syncrhronize on individual server endpoint instances instead of the whole leaseTable. In the threaddump, DGCClient.doRenewal(DGCClient.java:705) acquires the global lock "lock" before calling renewLeases() but hangs on a socket read because the 1st TestObject is suspended by ctrl-Z. A run of a new TestObject process will not get ack from the HelloServer because the HelloServer process will hang on DGCClient.referenced() which is waiting for "lock" to be released before it can update the leaseTable. With the 1.2 fix, the deadlock would not happen since: 1. HelloServer blocks when renewing leases with 1st R2 (1st server endpoint lock acquired) 2. HelloServer tries to add 2nd R2 to leaseTable and proceeds successfully since this is a new server endpoint lock. patrick.ong@Eng 1999-05-10 At this moment, our Java CTE sustaining team is overloaded with escalations and there is no bandwidth to backport the comnplete DGC client code architecture to a 1.1.8 official patch. However, if there is a customer that really needs this fix in 1.1.8, please submit an escalation, and we can help by supporting a set of binaries that will fix this problem in a surgical fashion. If resources permit in the future, we may fix this in a 1.1.8 patch with the full blown DGC changes from 1.2.2. patrick.ong@Eng 2000-02-16 This bug is a closed escalation. It is fixed in 1.2 and later releases. Since the escalation is closed, I am closing it as "will not fix". CTE may reopen this bug if they have resources to fold the fix into a 1.1.8 patch release. ann.wollrath@East 2001-01-09
09-01-2001