problem is observed with JRE 1.5.0_09 on RHEL4u3 32-bit A customer has a JMX manager software which uses Notifications from agents as well as JMX query. The manager software encountered two unexpected situations (A and B below). The manager deals with three JMX agents, thus the logs often confuse us to tell what was really happing. (A) ClientNotifFowarder thread died, but RMIConnector did not catch error. Later, JMX query failed, but restart operation ended up with IOException("Illegal state."). // ClientNotifForwarder.java public synchronized ClientListenerInfo[] preReconnection() throws IOException { if (state == TERMINATED || beingReconnected) { // should never throw new IOException("Illegal state."); } 2008/04/12 20:06:13 ClientCommunicatorAdmin restart WARNING: Failed to restart: java.io.IOException: Not connected 2008/04/13 9:07:24 ClientCommunicatorAdmin restart WARNING: Failed to restart: java.io.IOException: Illegal state. 2008/04/13 9:07:24 ClientCommunicatorAdmin restart WARNING: Failed to restart: java.io.IOException: The ClientNotifForwarder has been terminated. 2008/04/13 17:37:17 ClientCommunicatorAdmin restart WARNING: Failed to restart: java.rmi.NoSuchObjectException: no such object in table 2008/04/13 17:37:17 RMIConnector RMIClientCommunicatorAdmin-doStop WARNING: Failed to call the method close():java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is: java.net.SocketException: Connection reset There was a message which tells ClientNotifForwarder thread was stopped on 2008/04/12. 2008/03/14 10:01:45 RMIConnector RMIClientCommunicatorAdmin-doStop WARNING: Failed to call the method close():java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is: java.net.SocketException: Connection reset 2008/04/12 20:06:13 ClientNotifForwarder NotifFetcher-run SEVERE: Failed to fetch notification, stopping thread. Error is: java.rmi.ConnectException: Connection refused to host: 10.251.200.61; nested exception is: java.net.ConnectException: Connection refused java.rmi.ConnectException: Connection refused to host: 10.251.200.61; nested exception is: java.net.ConnectException: Connection refused at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:574) at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:185) * We initially thought there was a short network problem which resulted in ClientNotifFowarder thread to terminate. 2008/04/13 9:07:24 ClientCommunicatorAdmin restart WARNING: Failed to restart: java.io.IOException: The ClientNotifForwarder This seems to have caused "Illegal state" much later when ClientCommunicatorAdmin tried to restart (we guess there may have been another netowrk problem at that time...) 2008/04/13 9:07:24 ClientCommunicatorAdmin restart WARNING: Failed to restart: java.io.IOException: Illegal state. (B) It looks like both JMX query operation and ClientNotifForwarder thread got IOException almost at the same time. In that case, these two threads cause a kind of dead lock. Thread in JMX query: "Thread-3039643" prio=1 tid=0x39a005e8 nid=0x3f0a in Object.wait() [0x34b8e000..0x34b8f130] ��������at java.lang.Object.wait(Native Method) ��������- waiting on <0x532a7958> (a javax.management.remote.rmi.RMIConnector$RMINotifClient) ��������at java.lang.Object.wait(Object.java:474) ��������at com.sun.jmx.remote.internal.ClientNotifForwarder.postReconnection(ClientNotifForwarder.java:314) ��������- locked <0x532a7958> (a javax.management.remote.rmi.RMIConnector$RMINotifClient) ��������at javax.management.remote.rmi.RMIConnector$RMIClientCommunicatorAdmin.reconnectNotificationListeners(RMIConnector.java:1512) ��������at javax.management.remote.rmi.RMIConnector$RMIClientCommunicatorAdmin.doStart(RMIConnector.java:1577) ��������at com.sun.jmx.remote.internal.ClientCommunicatorAdmin.restart(ClientCommunicatorAdmin.java:106) ��������at com.sun.jmx.remote.internal.ClientCommunicatorAdmin.gotIOException(ClientCommunicatorAdmin.java:34) ��������at javax.management.remote.rmi.RMIConnector$RMIClientCommunicatorAdmin.gotIOException(RMIConnector.java:1413) ��������at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.getMBeanInfo(RMIConnector.java:1033) ��������at jp.co.ntt.lab.soa.crane.manager.base.mbs.ProxyMBean$GetMBeanInfoThread.run(ProxyMBean.java:476) ClientNotifFowarder thread: "ClientNotifForwarder-2" daemon prio=1 tid=0x08684a98 nid=0x4e61 in Object.wait() [0x3e5da000..0x3e5db0b0] ��������at java.lang.Object.wait(Native Method) ��������- waiting on <0x532e02e8> (a [I) ��������at java.lang.Object.wait(Object.java:474) ��������at com.sun.jmx.remote.internal.ClientCommunicatorAdmin.restart(ClientCommunicatorAdmin.java:82) ��������- locked <0x532e02e8> (a [I) ��������at com.sun.jmx.remote.internal.ClientCommunicatorAdmin.gotIOException(ClientCommunicatorAdmin.java:34) ��������at javax.management.remote.rmi.RMIConnector$RMIClientCommunicatorAdmin.gotIOException(RMIConnector.java:1413) ��������at javax.management.remote.rmi.RMIConnector$RMINotifClient.fetchNotifs(RMIConnector.java:1293) ��������at com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.fetchNotifs(ClientNotifForwarder.java:508) ��������at com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:399) ��������at com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor$1.run(ClientNotifForwarder.java:83) * It looks like these thread prevent each other to complete restart operations. It looks like current JMX implementation does not handle these cases well. The customer is asking for a workaround or a fix for these. The customer investigated more and guess that this was not caused by network unreliability. Unstable state between the finish of ClientNotifForwarder and the finish of RMIConnector in server. When client communicate with the server during this period, NoSuchObjectException (extends IOException) occured in earlier timing and ConnectException (connection refused) occured in later timing. Full logs as well as a test case for the Illegal State with reproduction instructions in (README.txt) have been attached to this bug.
|