Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
Running: - s10 + patches - sc31u4 + 120500-04 & 120489-01 - cacao 1.1 + 120675-01 - odyssey R2 2/21/06 nightly Problem: I was staring geo cluster on 2 clusters that have a partnership defined between them and odyssey failed to start due to cacao going down. I have console msgs from each node below. The corresponding cacao logs are attached. Note that failover of odyssey infrastructure to the backup node succeded fine. Sometime later I switched the geo-infrastructure rg to the nodes where the failure occured and ody started up fine at that time. ***On phys-sabre-1 (1st cluster) - # geoadm start ... checking for management agent ... ... management agent check done .... ... starting product infrastructure ... please wait ... # [thread 144 also had an error] # An unexpected error has been detected by HotSpot Virtual Machine: # # SIGBUS (0xa) at pc=0xf03d8428, pid=27182, tid=45 # # Java VM: Java HotSpot(TM) Server VM (1.5.0_06-b05 mixed mode) # Problematic frame: # C [libscrgadm.so.1+0x8428] # # An error report file with more information is saved as hs_err_pid27182.log # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # Feb 23 15:41:06 phys-sabre-1 cacao[27180]: SUNWcacao launcher : cacao exited abnormaly Feb 23 15:41:06 phys-sabre-1 cacao[27180]: SUNWcacao launcher : no retries available, stop monitoring of cacao Feb 23, 2006 3:41:06 PM GenericConenctor RequestHandler-connectionException WARNING: java.io.EOFException java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2502) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1267) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:339) at com.sun.jmx.remote.socket.SocketConnection.readMessage(SocketConnection.java:211) at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl$MessageReader.run(ClientSynchroMessageConnectionImpl.java:391) at com.sun.jmx.remote.opt.util.ThreadService$ThreadServiceJob.run(ThreadService.java:208) at com.sun.jmx.remote.opt.util.JobExecutor.run(JobExecutor.java:59) Feb 23, 2006 3:41:06 PM ClientCommunicatorAdmin restart WARNING: Failed to restart: java.net.ConnectException: Connection refused Exception in thread "main" java.lang.reflect.UndeclaredThrowableException at $Proxy0.startFailoverGroup(Unknown Source) at ServiceControl.main(ServiceControl.java:96) Caused by: javax.management.remote.generic.ConnectionClosedException: The connection has been closed by the server. at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl.close(ClientSynchroMessageConnectionImpl.java:338) at javax.management.remote.generic.GenericConnector.close(GenericConnector.java:276) at javax.management.remote.generic.GenericConnector.close(GenericConnector.java:231) at javax.management.remote.generic.ClientIntermediary$GenericClientCommunicatorAdmin.doStop(ClientIntermediary.java:839) at com.sun.jmx.remote.opt.internal.ClientCommunicatorAdmin.restart(ClientCommunicatorAdmin.java:133) at com.sun.jmx.remote.opt.internal.ClientCommunicatorAdmin.gotIOException(ClientCommunicatorAdmin.java:34) at javax.management.remote.generic.GenericConnector$RequestHandler.connectionException(GenericConnector.java:667) at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl$MessageReader.run(ClientSynchroMessageConnectionImpl.java:398) at com.sun.jmx.remote.opt.util.ThreadService$ThreadServiceJob.run(ThreadService.java:208) at com.sun.jmx.remote.opt.util.JobExecutor.run(JobExecutor.java:59) Feb 23, 2006 3:41:08 PM ClientIntermediary close INFO: java.io.IOException: The connection is not currently established. java.io.IOException: The connection is not currently established. at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl.checkState(ClientSynchroMessageConnectionImpl.java:567) at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl.sendOneWay(ClientSynchroMessageConnectionImpl.java:161) at javax.management.remote.generic.GenericConnector.close(GenericConnector.java:260) at javax.management.remote.generic.GenericConnector.close(GenericConnector.java:231) at javax.management.remote.generic.ClientIntermediary$GenericClientCommunicatorAdmin.doStop(ClientIntermediary.java:839) at com.sun.jmx.remote.opt.internal.ClientCommunicatorAdmin.restart(ClientCommunicatorAdmin.java:133) at com.sun.jmx.remote.opt.internal.ClientCommunicatorAdmin.gotIOException(ClientCommunicatorAdmin.java:34) at javax.management.remote.generic.GenericConnector$RequestHandler.connectionException(GenericConnector.java:667) at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl$MessageReader.run(ClientSynchroMessageConnectionImpl.java:398) at com.sun.jmx.remote.opt.util.ThreadService$ThreadServiceJob.run(ThreadService.java:208) at com.sun.jmx.remote.opt.util.JobExecutor.run(JobExecutor.java:59) Feb 23 15:41:08 phys-sabre-1 SC[SUNW.scmasa,geo-infrastructure,geo-failovercontrol,scmasa_svc_start]: Failed to start /usr/cluster/lib/rgm/rt/hamasa/cmas_service_ctrl_start geo-infrastructure. Feb 23 15:41:08 phys-sabre-1 Cluster.RGM.rgmd: Method <scmasa_svc_start> failed on resource <geo-failovercontrol> in resource group <geo-infrastructure> [exit code <1>, time used: 6% of timeout <600 seconds>] Feb 23, 2006 3:41:10 PM ServiceControl main WARNING: Unable to connect to the CACAO agent. The agent may be down or restarting Feb 23 15:41:19 phys-sabre-1 ip: TCP_IOC_ABORT_CONN: local = 010.006.173.091:0, remote = 000.000.000.000:0, start = -2, end = 6 Feb 23 15:41:19 phys-sabre-1 ip: TCP_IOC_ABORT_CONN: aborted 0 connection Registering resource type <SUNW.HBmonitor>...done. Resource type <SUNW.scmasa> has been registered already Creating failover resource group <geo-clusterstate>...done. Creating failover resource group <geo-infrastructure>...done. Creating logical host resource <geo-clustername>... Logical host resource created successfully .... Creating resource <geo-hbmonitor> ...done. Creating resource <geo-failovercontrol> ...done. Bringing RG <geo-infrastructure> to managed state ...done. Enabling resource <geo-clustername> ...done. Enabling resource <geo-hbmonitor> ...done. Enabling resource <geo-failovercontrol> ...done. Node phys-sabre-1: Bringing resource group <geo-infrastructure> online ...scswitch: Resource group geo-infrastructure failed to start on chosen node and may fail over to other node(s) FAILED: scswitch -z -g geo-infrastructure -h phys-sabre-1 # ***On phys-sabre-3 (2nd cluster) - # geoadm start ... checking for management agent ... ... management agent check done .... ... starting product infrastructure ... please wait ... [thread 45 also had an error]# # An unexpected error has been detected by HotSpot Virtual Machine: # # SIGBUS (0xa) at pc=0xf04a8298, pid=17957, tid=157 # # Java VM: Java HotSpot(TM) Server VM (1.5.0_06-b05 mixed mode) # Problematic frame: # C [libscrgadm.so.1+0x8298] # # An error report file with more information is saved as hs_err_pid17957.log # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # Feb 23 15:45:50 phys-sabre-3 cacao[17955]: SUNWcacao launcher : cacao exited abnormaly Feb 23 15:45:50 phys-sabre-3 cacao[17955]: SUNWcacao launcher : no retries available, stop monitoring of cacao Feb 23, 2006 3:45:50 PM GenericConenctor RequestHandler-connectionException WARNING: java.io.EOFException java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2502) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1267) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:339) at com.sun.jmx.remote.socket.SocketConnection.readMessage(SocketConnection.java:211) at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl$MessageReader.run(ClientSynchroMessageConnectionImpl.java:391) at com.sun.jmx.remote.opt.util.ThreadService$ThreadServiceJob.run(ThreadService.java:208) at com.sun.jmx.remote.opt.util.JobExecutor.run(JobExecutor.java:59) Feb 23, 2006 3:45:50 PM ClientCommunicatorAdmin restart WARNING: Failed to restart: java.net.ConnectException: Connection refused Exception in thread "main" java.lang.reflect.UndeclaredThrowableException at $Proxy0.startFailoverGroup(Unknown Source) at ServiceControl.main(ServiceControl.java:96) Caused by: javax.management.remote.generic.ConnectionClosedException: The connection has been closed by the server. at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl.close(ClientSynchroMessageConnectionImpl.java:338) at javax.management.remote.generic.GenericConnector.close(GenericConnector.java:276) at javax.management.remote.generic.GenericConnector.close(GenericConnector.java:231) at javax.management.remote.generic.ClientIntermediary$GenericClientCommunicatorAdmin.doStop(ClientIntermediary.java:839) at com.sun.jmx.remote.opt.internal.ClientCommunicatorAdmin.restart(ClientCommunicatorAdmin.java:133) at com.sun.jmx.remote.opt.internal.ClientCommunicatorAdmin.gotIOException(ClientCommunicatorAdmin.java:34) at javax.management.remote.generic.GenericConnector$RequestHandler.connectionException(GenericConnector.java:667) at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl$MessageReader.run(ClientSynchroMessageConnectionImpl.java:398) at com.sun.jmx.remote.opt.util.ThreadService$ThreadServiceJob.run(ThreadService.java:208) at com.sun.jmx.remote.opt.util.JobExecutor.run(JobExecutor.java:59) Feb 23, 2006 3:45:52 PM ClientIntermediary close INFO: java.io.IOException: The connection is not currently established. java.io.IOException: The connection is not currently established. at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl.checkState(ClientSynchroMessageConnectionImpl.java:567) at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl.sendOneWay(ClientSynchroMessageConnectionImpl.java:161) at javax.management.remote.generic.GenericConnector.close(GenericConnector.java:260) at javax.management.remote.generic.GenericConnector.close(GenericConnector.java:231) at javax.management.remote.generic.ClientIntermediary$GenericClientCommunicatorAdmin.doStop(ClientIntermediary.java:839) at com.sun.jmx.remote.opt.internal.ClientCommunicatorAdmin.restart(ClientCommunicatorAdmin.java:133) at com.sun.jmx.remote.opt.internal.ClientCommunicatorAdmin.gotIOException(ClientCommunicatorAdmin.java:34) at javax.management.remote.generic.GenericConnector$RequestHandler.connectionException(GenericConnector.java:667) at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl$MessageReader.run(ClientSynchroMessageConnectionImpl.java:398) at com.sun.jmx.remote.opt.util.ThreadService$ThreadServiceJob.run(ThreadService.java:208) at com.sun.jmx.remote.opt.util.JobExecutor.run(JobExecutor.java:59) Feb 23 15:45:52 phys-sabre-3 SC[SUNW.scmasa,geo-infrastructure,geo-failovercontrol,scmasa_svc_start]: Failed to start /usr/cluster/lib/rgm/rt/hamasa/cmas_service_ctrl_start geo-infrastructure. Feb 23 15:45:52 phys-sabre-3 Cluster.RGM.rgmd: Method <scmasa_svc_start> failed on resource <geo-failovercontrol> in resource group <geo-infrastructure> [exit code <1>, time used: 15% of timeout <600 seconds>] Feb 23, 2006 3:45:54 PM ServiceControl main WARNING: Unable to connect to the CACAO agent. The agent may be down or restarting Feb 23 15:46:03 phys-sabre-3 ip: TCP_IOC_ABORT_CONN: local = 010.006.173.096:0, remote = 000.000.000.000:0, start = -2, end = 6 Feb 23 15:46:03 phys-sabre-3 ip: TCP_IOC_ABORT_CONN: aborted 0 connection Registering resource type <SUNW.HBmonitor>...done. Resource type <SUNW.scmasa> has been registered already Creating failover resource group <geo-clusterstate>...done. Creating failover resource group <geo-infrastructure>...done. Creating logical host resource <geo-clustername>... Logical host resource created successfully .... Creating resource <geo-hbmonitor> ...done. Creating resource <geo-failovercontrol> ...done. Bringing RG <geo-infrastructure> to managed state ...done. Enabling resource <geo-clustername> ...done. Enabling resource <geo-hbmonitor> ...done. Enabling resource <geo-failovercontrol> ...done. Node phys-sabre-3: Bringing resource group <geo-infrastructure> online ...scswitch: Resource group geo-infrastructure failed to start on chosen node and may fail over to other node(s) FAILED: scswitch -z -g geo-infrastructure -h phys-sabre-3 #
|