JDK-4228651 : Distributed gc threads hang
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.rmi
  • Affected Version: 1.1.7
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • OS: windows_nt
  • CPU: x86
  • Submitted: 1999-04-12
  • Updated: 1999-05-21
  • Resolved: 1999-05-21
Related Reports
Duplicate :  
Description
File descriptors for sockets seem to be propogated
by Runtime.exec() calls such that if the parent process
terminates, the child will continue to hold open it's OS file
descriptors (visible in netstat command output). Since the
child has no thread listening on the socket any attempts to
connect to it succeed from a client's POV, but appear to be
piped to the bit bucket. This hangs RMI's DGC threads, as well
as any thread attempting to communicate with any remote objects
from the, now deceased, parent process. (see below for our
initial diagnosis.)


The customer has a process which spawns multiple child processes
then exits. The inherited descriptors result in the DGC
threads on all servers that have interacted with the parent
process to hang, as well as our internal client termination
thread. The deadlock of these threads (esp. the DGC's LeaseRenewer)
seems to prevent any additional RMI activity from taking
place in the process, further deadlocking the process and
the customer's entire application. (we suspect this has something
to do with a lock on the object table held by the LeaseRenwer thread.)
This problem does not exist on Solaris.


====================================================
Here is an example of the hung process with jdk 1.1.6
====================================================
Full thread dump:
    "Cleaner" (TID:0xf66090, sys_thread_t:0x8aa8d0, Win32ID:0x125,state:CW) prio=5
        sun.rmi.transport.DGCClient.run(Compiled Code)
        java.lang.Thread.run(Thread.java:474)
    "LeaseRenewer" (TID:0xf660a8, sys_thread_t:0x8aa2e0, Win32ID:0x78,state:R) prio=5
        java.net.SocketInputStream.read(Compiled Code)
        java.io.BufferedInputStream.fill(Compiled Code)
        java.io.BufferedInputStream.read(Compiled Code)
        java.io.DataInputStream.readByte(Compiled Code)
        sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:147)
        sun.rmi.server.UnicastRef.newCall(UnicastRef.java:73)
        sun.rmi.transport.DGCImpl_Stub.dirty(DGCImpl_Stub.java:55)
        sun.rmi.transport.DGCClient.renewLeases(Compiled Code)
        sun.rmi.transport.DGCClient.doRenewal(Compiled Code)
        sun.rmi.transport.DGCClient$LeaseRenewer.run(Compiled Code)
        java.lang.Thread.run(Thread.java:474)
    "Reaper" (TID:0xf660e0, sys_thread_t:0x8aa200, Win32ID:0x133, state:CW) prio=5
        sun.rmi.transport.Reaper.run(Compiled Code)
        java.lang.Thread.run(Thread.java:474)
    "Finalizer thread" (TID:0xf60088, sys_thread_t:0x89d560, Win32ID:0x93,state:CW) prio=2
    "main" (TID:0xf600b0, sys_thread_t:0x89a7d0, Win32ID:0x132, state:R) prio=5
        java.net.SocketInputStream.read(Compiled Code)
        java.io.BufferedInputStream.fill(Compiled Code)
        java.io.BufferedInputStream.read(Compiled Code)
        java.io.DataInputStream.readByte(Compiled Code)
        sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:147)
        sun.rmi.server.UnicastRef.newCall(UnicastRef.java:73)
        RMIWidget_Stub.pingMe(RMIWidget_Stub.java:27)
        hanger.main(Compiled Code)
Monitor Cache Dump:
    <unknown key> (0x8aa200): <unowned>
        Waiters: 1
    java.io.BufferedInputStream@F6B8C8/FEACD0: owner "main" (0x89a7d0, 1 entry)
    java.io.BufferedInputStream@F6BC80/FEC170: owner "LeaseRenewer" (0x8aa2e0, 1 entry)
    java.lang.Object@F65E78/FBC208: owner "LeaseRenewer" (0x8aa2e0, 1 entry)
    java.util.Vector@F65E58/FBC258: <unowned>
        Waiters: 1
Registered Monitor Dump:
    SymcJIT Method Monitor: <unowned>
    SymcJIT Method Monitor: <unowned>
    SymcJIT Method List Monitor: <unowned>
    SymcJIT Fixups Allocation: <unowned>
    SymcJIT Code Allocation: <unowned>
    SymcJIT Data Allocation: <unowned>
    Thread queue lock: <unowned>
    Name and type hash table lock: <unowned>
    String intern lock: <unowned>
    JNI pinning lock: <unowned>
    JNI global reference lock: <unowned>
    BinClass lock: <unowned>
    Class loading lock: <unowned>
    Java stack lock: <unowned>
    Code rewrite lock: <unowned>
    Heap lock: <unowned>
    Has finalization queue lock: <unowned>
    Finalize me queue lock: <unowned>
        Waiters: 1
    Monitor registry: <unowned>



====================================================
COMPILATION INSTRUCTIONS
====================================================
"javac" Compile all java files
"rmic" Compile RMIWidget



====================================================
TESTCASE INSTRUCTIONS
====================================================
how to run this test case:
(note: all lines output by the test case are marked with => herein to
distinguish them...
this flag will not really display, it's just to get around the non-marked
up nature of a
text file :)


unzip the testcase to a directory, for reference I'll call it d:\9310

open three command prompts to d:\9310
run the rmiregistry in one window, this window may be minimized and
ignored...

run "java parent yes" in another window, you should see in this "parent"
window...
=>RMIWidget::RMIWidget(0) begin
=>RMIWidget::RMIWidget(0) end
this indicates that the rmiwidget has been successfully exported and
registered with
naming.

now run "java hanger" in the third, it will say:
=>Ping begining

then the parent will say:
=>RMIWidget-0 has been pinged...

then the hanger will say:
=>Ping completed
and then start to draw this line:
=>|----|----|----.
one character per second...

after the fifteen seconds are up "hanger" will repeat this (endless) loop
of pinging
the object exported from parent. You will see that after some time the
parent will
exec the child window and which is followed by a short nap:

=>exec'ing...
=>naping...

you will see the new window start up and just sit there... it shouldn't
report
anything, it's only point in life is to inherit the file descriptors used
by the
parent to export an object, and thereby display the bug. at this point the
parent
is done and it will report that:

=>done.

I've put in then another delay before the parent exits so that we may see
the hanger
thread continue to ping the object on the parent process while both parent
and child
are up. After this delay the parent will exit and return to a command
prompt. At
this point the Hanger will finish the current 15 second delay it's on then
report
that it is begining a new ping. this is as far as it will ever get. the
last thing
it says is:

=>Ping begining

taking a thread break of the hanger at this point will show that the main
thread is
now deadlocked, let it sit a while and the DGC will wake up and attempt to
renew
leases, at this point a thread break will show that it is deadlocked....



====================================================
SOURCE CODE:
====================================================



====================================================
RMIWidgetInterface.java
====================================================
public interface RMIWidgetInterface extends java.rmi.Remote {
   public void pingMe() throws java.rmi.RemoteException;
}

====================================================
RMIWidget.java
====================================================
public class RMIWidget
extends java.rmi.server.UnicastRemoteObject
implements RMIWidgetInterface {
   private static final boolean trace = true;
   int number;
   public RMIWidget(int p) throws java.rmi.RemoteException {
      super();
      if (trace) System.out.println("RMIWidget::RMIWidget("+p+") begin");
      number = p;
      try {
         java.rmi.Naming.bind("RMIWidget-"+p, this);
      } catch (java.rmi.AlreadyBoundException abe) {
         try {
            java.rmi.Naming.rebind("RMIWidget-"+p, this);
         } catch (java.rmi.AccessException ae) {
            System.out.println("-=RESET THE REGISTRY=-");
         } catch (java.net.MalformedURLException mue) {
            mue.printStackTrace();
         }
      } catch (java.net.MalformedURLException mue) {
         mue.printStackTrace();
      }
      if (trace) System.out.println("RMIWidget::RMIWidget("+p+") end");
   }
   public void pingMe() throws java.rmi.RemoteException {
      if (trace) System.out.println("RMIWidget-"+this.number+" has been
pinged...");
   }
}

====================================================
parent.java
====================================================
public class parent {
   public static void main (String[] args) {
      if (args.length != 1) {
         System.out.println("usage: java parent [yes|no]");
         System.exit(0);
      }
      boolean exit = (args[0].toLowerCase().charAt(0) == 'y');
      try {
         RMIWidgetInterface target = new RMIWidget(0);
         Thread.sleep(120*1000); //2 minutes to start hanger, get some
pings done
         System.out.println("exec'ing...");
          //2 hours should be enough ;)
//       Runtime.getRuntime().exec("java child 120"); //generic
         Runtime.getRuntime().exec("cmd /c start java child 120"); //NT so
we get a window
         System.out.println("naping...");
         Thread.sleep(20*1000); //some platforms (solaris) need this for
the exec to work
         System.out.println("done.");
         if (exit) {
            Thread.sleep(120*1000); //2 minutes to verify hanger still up
and good
            System.out.println("exiting");
            System.exit(0);
         }
      } catch (Exception e) {
         e.printStackTrace();
      }
   }
}

====================================================
child.java
====================================================
public class child {
   public static void main(String[] args) {
      try {
         Thread.sleep(Integer.parseInt(args[0])*60*1000);
      } catch (InterruptedException ie) {
         ie.printStackTrace();
      }
   }
}

====================================================
hanger.java
====================================================
public class hanger {
   public static void main (String[] args) {
      try {
         RMIWidgetInterface target = (RMIWidgetInterface)
java.rmi.Naming.lookup("RMIWidget-0");
         while (true) {
            System.out.println("Ping begining");
            target.pingMe();
            System.out.println("Ping completed");
            for (int lcv = 15; lcv > 0; --lcv) {
               Thread.sleep(1000);
               System.out.print((lcv%5==0)?('|'):('-'));
            }
            System.out.println('.');
         }
      } catch (Exception e) {
         e.printStackTrace();
      }
   }
}

---------------------------------------------

Here is a thread dump from 1.2.2 V build hung process:

Full thread dump Classic VM (JDK-1.2-V, native threads):
    "GC Daemon" (TID:0x18f3230, sys_thread_t:0x859ec0, state:CW, native ID:0xd6) prio=2
        at java.lang.Object.wait(Native Method)
        at sun.misc.GC$Daemon.run(Compiled Code)
    "RMI RenewClean-[129.144.171.109:1102]" (TID:0x18f2fe8, sys_thread_t:0x857ae0, state:R, native ID:0xb8) prio=5
        at java.net.SocketInputStream.socketRead(Native Method)
        at java.net.SocketInputStream.read(Compiled Code)
        at java.io.BufferedInputStream.fill(Compiled Code)
        at java.io.BufferedInputStream.read(Compiled Code)
        at java.io.DataInputStream.readByte(Compiled Code)
        at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:210)
        at sun.rmi.transport.tcp.TCPChannel.newConnection(Compiled Code)
        at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:306)
        at sun.rmi.transport.DGCImpl_Stub.dirty(Unknown Source)
        at sun.rmi.transport.DGCClient$EndpointEntry.makeDirtyCall(Compiled Code)
        at sun.rmi.transport.DGCClient$EndpointEntry$RenewCleanThread.run(Compiled Code)
        at java.lang.Thread.run(Thread.java:479)
    "Finalizer" (TID:0x18e9320, sys_thread_t:0x81b460, state:CW, native ID:0xc4) prio=8
        at java.lang.Object.wait(Native Method)
        at java.lang.ref.ReferenceQueue.remove(Compiled Code)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:127)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:174)
    "Reference Handler" (TID:0x18e93b0, sys_thread_t:0x819390, state:CW, native
ID:0xc2) prio=10
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:424)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:114)
    "Signal dispatcher" (TID:0x18e93e0, sys_thread_t:0x819770, state:R, native ID:0xbf) prio=5
    "main" (TID:0x18e91e0, sys_thread_t:0x7d0c00, state:R, native ID:0xcc) prio=5
        at java.net.SocketInputStream.socketRead(Native Method)
        at java.net.SocketInputStream.read(Compiled Code)
        at java.io.BufferedInputStream.fill(Compiled Code)
        at java.io.BufferedInputStream.read(Compiled Code)
        at sun.rmi.transport.tcp.TCPConnection.isDead(TCPConnection.java:173)
        at sun.rmi.transport.tcp.TCPChannel.newConnection(Compiled Code)
        at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:306)
        at RMIWidget_Stub.pingMe(RMIWidget_Stub.java:29)
        at hanger.main(Compiled Code)
Monitor Cache Dump:
    java.lang.ref.ReferenceQueue$Lock@18E9338/191ED60: <unowned>
        Waiting to be notified:
            "Finalizer" (0x81b460)
    java.lang.ref.Reference$Lock@18E93C0/191E890: <unowned>
        Waiting to be notified:
            "Reference Handler" (0x819390)
    sun.misc.GC$LatencyLock@18F2F90/1950550: <unowned>
        Waiting to be notified:
            "GC Daemon" (0x859ec0)
    java.io.BufferedInputStream@18F3578/1951430: owner "main" (0x7d0c00) 1 entry
    java.io.BufferedInputStream@18F5378/19540D0: owner "RMI RenewClean-[129.144.171.109:1102]" (0x857ae0) 1 entry
Registered Monitor Dump:
    SymcJIT Method Monitor: <unowned>
    SymcJIT Method Monitor: <unowned>
    SymcJIT Method List Monitor: <unowned>
    SymcJIT Lock: <unowned>
    utf8 hash table: <unowned>
    JNI pinning lock: <unowned>
    JNI global reference lock: <unowned>
    BinClass lock: <unowned>
    Class linking lock: <unowned>
    System class loader lock: <unowned>
    Code rewrite lock: <unowned>
    Heap lock: <unowned>
    Monitor cache lock: owner "Signal dispatcher" (0x819770) 1 entry
    Thread queue lock: owner "Signal dispatcher" (0x819770) 1 entry
    Monitor registry: owner "Signal dispatcher" (0x819770) 1 entry


    SymcJIT Method List Monitor: <unowned>
    SymcJIT Lock: <unowned>
    utf8 hash table: <unowned>
    JNI pinning lock: <unowned>
    JNI global reference lock: <unowned>
    BinClass lock: <unowned>
    Class linking lock: <unowned>
    System class loader lock: <unowned>
    Code rewrite lock: <unowned>
    Heap lock: <unowned>
    Monitor cache lock: owner "Signal dispatcher" (0x819770) 1 entry
    Thread queue lock: owner "Signal dispatcher" (0x819770) 1 entry
    Monitor registry: owner "Signal dispatcher" (0x819770) 1 entry




Comments
EVALUATION It is possible that this bug is caused by bug 4226268 (Synchronizing on DGC causes deadlock) and I am working on verifying it. The fix to 4226268 is in 1.2.2 but because it use java.lang.ref.PhantomReferences, it cannot be backported to 1.1.x without introducing public API changes. So if 4226268 is the rootcause, another approach to fix this in 1.1.x must be developed. Note that IBM has reported that the fix to 4197666 does not resolve this testcase. patrick.ong@Eng 1999-04-15 After applying the fix to 4197666, the testcase will no longer hang the "hanger" process when the parent process exits but throw the following exception: java.rmi.ConnectException: Connection refused to host ... This is expected behaviour since the child process no longer inherits the socket from the parent process and the hanger process will not be able to connect to the old socket. While it is true that the 1.1.7B code suffers from bug 4226268, I see the failure reported in this testcase to be similar to that reported in bug 4197666. I will mark this bug as a duplicate as soon as IBM agrees with the assessment. patrick.ong@Eng 1999-05-10
10-05-1999