JDK-6969958 : RMI-IIOP EJB clients still do not fail over after applying the fix from CR 6828768
  • Type: Bug
  • Component: other-libs
  • Sub-Component: corba:orb
  • Affected Version: 5.0u16,5.0u20
  • Priority: P3
  • Status: Resolved
  • Resolution: Won't Fix
  • OS: linux_redhat_4.0,windows
  • CPU: generic,x86
  • Submitted: 2010-07-16
  • Updated: 2014-03-12
  • Resolved: 2014-03-12
Related Reports
Relates :  
Description
Problem is observed with application server cluster (Oracle WebLogic 
Server (WLS) cluster). A stateful EJB application deployed to the cluster. 
The problem is Oracle WebLogic Server IIOP EJB client not getting failover 
to the available server in the cluster. In other words the EJB Client 
calls are load balancing between the managed servers of the cluster as 
long as both servers are running well. The Sun JDK 1.5.x IIOP is not 
doing context failover when ever one of the running managed servers is 
down. The same application is tested on previous WLS version which is 
using jdk 1.4.2 and all works well. JDK 1.5.x environment is throwing 
the below exception.


org.omg.CORBA.COMM_FAILURE:   vmcid: SUN  minor code: 203  completed: No
       at 
com.sun.corba.se.impl.logging.ORBUtilSystemException.writeErrorSend(ORBUtilSystemException.java:2259)
       at 
com.sun.corba.se.impl.logging.ORBUtilSystemException.writeErrorSend(ORBUtilSystemException.java:2281)
       at 
com.sun.corba.se.impl.transport.SocketOrChannelConnectionImpl.writeLock(SocketOrChannelConnectionImpl.java:933)
       at 
com.sun.corba.se.impl.encoding.BufferManagerWriteStream.sendFragment(BufferManagerWriteStream.java:86)
       at 
com.sun.corba.se.impl.encoding.BufferManagerWriteStream.sendMessage(BufferManagerWriteStream.java:104)
       at 
com.sun.corba.se.impl.encoding.CDROutputObject.finishSendingMessage(CDROutputObject.java:144)
       at 
com.sun.corba.se.impl.protocol.CorbaMessageMediatorImpl.finishSendingRequest(CorbaMessageMediatorImpl.java:247)
       at 
com.sun.corba.se.impl.protocol.CorbaClientRequestDispatcherImpl.marshalingComplete1(CorbaClientRequestDispatcherImpl.java:342)
       at 
com.sun.corba.se.impl.protocol.CorbaClientRequestDispatcherImpl.marshalingComplete(CorbaClientRequestDispatcherImpl.java:323)
       at 
com.sun.corba.se.impl.protocol.CorbaClientDelegateImpl.invoke(CorbaClientDelegateImpl.java:129)
       at 
com.sun.corba.se.impl.protocol.CorbaClientDelegateImpl.is_a(CorbaClientDelegateImpl.java:213)
       at org.omg.CORBA.portable.ObjectImpl._is_a(ObjectImpl.java:112)
       at weblogic.corba.j2ee.naming.Utils.narrowContext(Utils.java:126)
       at 
weblogic.corba.j2ee.naming.InitialContextFactoryImpl.getInitialContext(InitialContextFactoryImpl.java:94)
       at 
weblogic.corba.j2ee.naming.InitialContextFactoryImpl.getInitialContext(InitialContextFactoryImpl.java:31)
       at 
weblogic.jndi.WLInitialContextFactory.getInitialContext(WLInitialContextFactory.java:41)
       at 
javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:667)
       at 
javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:247)
       at javax.naming.InitialContext.init(InitialContext.java:223)
       at javax.naming.InitialContext.<init>(InitialContext.java:197)
       at MyClient.main(MyClient.java:26) java.lang.NullPointerException
       at MyClient.main(MyClient.java:35)
       

Here are the reproducible Steps:

1. Install WLS 9.2.3 on Windows or Unix machine
2. Create a cluster with two managed servers using the domain 
configuration wizard, you can keep the clustered weblogic server running 
on the same machine.
3. You compile the EJB source and then create the deployable EJB jar 
file using the weblogic.appc utility. You should change the IIOP:// UTL 
in the myclient.java file.
4. Deploy the EJB jar application to the cluster using the admin console
5. Keep Invoking the EJB client and you observe that the EJB IIOP 
request is being load balanced across the JVMs (managed servers 1 & 2)
6. Stop/Kill one of the managed servers and then keep invoking the EJB 
client, you end up seeing the IIOP request load balanced to unavailable 
JVM and request fails with the below error.


Attachments:

EJBTestApp.zip
       
EJB compilation command:

java -classpath .;%CLASSPATH% weblogic.appc SourceApp.jar -output 
finalEJB.jar

http://download.oracle.com/docs/cd/E13222_01/wls/docs92/programming/splitbuild.html#wp1102587

Client commands:

javac -classpath 
.;<PATH>\client.jar;<JDK-HOME>\lib\tools.jar;<JDK-HOME>\lib\rt.jar;<WLS_HOME>\server\lib\wlclient.jar;<WLS-HOME\server\lib\wl-j2ee-client.jar 
MyClient.java

java -classpath 
.;<PATH>\client.jar;<JDK-HOME>\lib\tools.jar;<JDK-HOME>\lib\rt.jar;<WLS_HOME>\server\lib\wlclient.jar;<WLS-HOME\server\lib\wl-j2ee-client.jar 
MyClient

Comments
EVALUATION This code in src/share/classes/com/sun/corba/se/impl/legacy/connection/SocketFactoryConnectionImpl.java seems a little doubtful. try { socket = orb.getORBData().getLegacySocketFactory().createSocket(socketInfo); socketChannel = socket.getChannel(); if (socketChannel != null) { socketChannel.configureBlocking(isBlocking); } else { // IMPORTANT: non-channel-backed sockets must use // dedicated reader threads. setUseSelectThreadToWait(false); } it checks for null channel if socket is established. now from the discussion with Shrihari. we do get a DummySocket for a non existing managed server.so need to verify this with Ken. Also, on other side , monitoring wtih wireshark I can see packets on admin server port but on any of the ports mapping to managed server, where it listens toconnections.
18-08-2010

EVALUATION The problem does not reproduce always like I mentioned,in some cases we do have the request going to the other running managed server and executing properly and the transaction completing successfully. however, i tried to analyse the case when we get the failure and here's what I have analysed testing yesterday fully with the debug build CorbaClientRequestDispatcherImpl(main): .beginRequest: op/_is_a: Using created connection: SocketFactoryConnectionImpl[ Socket[unconnected] OPENING false true] SelectorImpl(main): .registerForEvent: SocketFactoryConnectionImpl[ Socket[unconnected] OPENING false true] SelectorImpl(main): .createReaderThread: SocketFactoryConnectionImpl[ Socket[unconnected] OPENING false true] CorbaOutboundConnectionCacheImpl(main): .put: SocketFactoryContactInfoImpl[IIOP_CLEAR_TEXT 129.158.229.247 8021] -1697947624 SocketFactoryConnectionImpl[ Socket[unconnected] ESTABLISHED false true] CorbaOutboundConnectionCacheImpl(main): .stats: 1/total 0/busy 1/idle (240/100/5) CorbaClientRequestDispatcherImpl(main): .beginRequest: op/_is_a id/5: created message mediator: com.sun.corba.se.impl.protocol.CorbaMessageMediatorImpl@175078b SocketFactoryAcceptorImpl(main): initialize: SocketFactoryAcceptorImpl[(not initialized) IIOP_CLEAR_TEXT false false] CorbaInboundConnectionCacheImpl(main): .constructor: cacheType: class com.sun.corba.se.impl.legacy.connection.SocketFactoryAcceptorImpl monitoringName: AcceptedConnections SelectorImpl(main): .registerForEvent: SocketFactoryAcceptorImpl[ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=1825] IIOP_CLEAR_TEXT false true] SelectorImpl(main): .createListenerThread: SocketFactoryAcceptorImpl[ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=1825] IIOP_CLEAR_TEXT false true] CorbaClientRequestDispatcherImpl(main): .beginRequest: op/_is_a id/5: created output object: com.sun.corba.se.impl.encoding.CDROutputObject@f7c31d CorbaResponseWaitingRoomImpl(main): .registerWaiter: op/_is_a id/5 CorbaOutboundConnectionCacheImpl(main): .reclaim->: 1 (240/100/5) CorbaOutboundConnectionCacheImpl(main): .reclaim<-: 1 (ORB).spi.servicecontext.ServiceContexts(main): Writing service contexts to output stream (ORB).spi.servicecontext.ServiceContexts(main): Service context has 4 components (ORB).spi.servicecontext.ServiceContexts(main): Writing service context MaxStreamFormatVersionServiceContext[2] (ORB).spi.servicecontext.ServiceContexts(main): Writing service context ORBVersionServiceContext[ version=ORBVersionImpl[20] ] (ORB).spi.servicecontext.ServiceContexts(main): Writing service context ServiceContext[ id=1111834880 ] (ORB).spi.servicecontext.ServiceContexts(main): Writing service context SendingContexServiceContext[ ior=[com.sun.corba.se.impl.ior.iiop.IIOPProfileImpl@85ceee0d] ] (ORB).spi.servicecontext.ServiceContexts(main): Service context writing complete CorbaClientRequestDispatcherImpl(main): .beginRequest: op/_is_a id/5: initialized message CorbaClientRequestDispatcherImpl(main): .beginRequest<-: op/_is_a CorbaClientRequestDispatcherImpl(main): .marshalingComplete->: op/_is_a id/5 CDROutputObject(main): .writeTo: SocketFactoryConnectionImp Also, the stack trace on the client side is org.omg.CORBA.COMM_FAILURE: vmcid: SUN minor code: 203 completed: No at com.sun.corba.se.impl.logging.ORBUtilSystemException.writeErrorSend(ORBUtilSystemException.java:2259) at com.sun.corba.se.impl.logging.ORBUtilSystemException.writeErrorSend(ORBUtilSystemException.java:2281) at com.sun.corba.se.impl.transport.SocketOrChannelConnectionImpl.writeLock(SocketOrChannelConnectionImpl.java:933) at com.sun.corba.se.impl.encoding.BufferManagerWriteStream.sendFragment(BufferManagerWriteStream.java:86) at com.sun.corba.se.impl.encoding.BufferManagerWriteStream.sendMessage(BufferManagerWriteStream.java:104) at com.sun.corba.se.impl.encoding.CDROutputObject.finishSendingMessage(CDROutputObject.java:144) at com.sun.corba.se.impl.protocol.CorbaMessageMediatorImpl.finishSendingRequest(CorbaMessageMediatorImpl.java:247) at com.sun.corba.se.impl.protocol.CorbaClientRequestDispatcherImpl.marshalingComplete1(CorbaClientRequestDispatcherImpl.java:342) at com.sun.corba.se.impl.protocol.CorbaClientRequestDispatcherImpl.marshalingComplete(CorbaClientRequestDispatcherImpl.java:323) at com.sun.corba.se.impl.protocol.CorbaClientDelegateImpl.invoke(CorbaClientDelegateImpl.java:129) at com.sun.corba.se.impl.protocol.CorbaClientDelegateImpl.is_a(CorbaClientDelegateImpl.java:213) at org.omg.CORBA.portable.ObjectImpl._is_a(ObjectImpl.java:112) at weblogic.corba.j2ee.naming.Utils.narrowContext(Utils.java:126) at weblogic.corba.j2ee.naming.InitialContextFactoryImpl.getInitialContext(InitialContextFactoryImpl.java:94) at weblogic.corba.j2ee.naming.InitialContextFactoryImpl.getInitialContext(InitialContextFactoryImpl.java:31) at weblogic.jndi.WLInitialContextFactory.getInitialContext(WLInitialContextFactory.java:41) at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:667) at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:247) at javax.naming.InitialContext.init(InitialContext.java:223) at javax.naming.InitialContext.<init>(InitialContext.java:197) at MyClient.main(MyClient.java:26) java.lang.NullPointerException at MyClient.main(MyClient.java:35) using both these I got reminded of the same problem you had come to me last year with the Citigroup case. this too is not a failover problem as we see the code passing and transaction executing some times. only when the connection itself fails we see this stack trace. This is seen in SelectorImpl(main): .registerForEvent: SocketFactoryAcceptorImpl[ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=1825] IIOP_CLEAR_TEXT false true] SelectorImpl(main): .createListenerThread: SocketFactoryAcceptorImpl[ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=1825] IIOP_CLEAR_TEXT false true] This problem is reflected in bug id 6877173 as I see. both the output traces are matching with 6877173 and I think this should be handled completely as in 6877173. here too when the connection fails we get the writeSendError Also, this bug 6877173 refers to 6832375 I got this in the output as well which agrees to 6832375 content. <Jul 21, 2010 12:49:48 PM GMT+05:30> <Notice> <Security> <BEA-090171> <Loading t he identity certificate and private key stored under the alias DemoIdentity from the jks keystore file C:\bea\WLSERV~1.3\server\lib\DemoIdentity.jks.> <Jul 21, 2010 12:49:48 PM GMT+05:30> <Notice> <Security> <BEA-090169> <Loading t rusted certificates from the jks keystore file C:\bea\WLSERV~1.3\server\lib\Demo Trust.jks.> <Jul 21, 2010 12:49:48 PM GMT+05:30> <Notice> <Security> <BEA-090169> <Loading t rusted certificates from the jks keystore file C:\JAVA\JDK16~1.0_2\jre\lib\secur ity\cacerts.> <Jul 21, 2010 12:49:49 PM GMT+05:30> <Error> <WebLogicServer> <BEA-000297> <Inco nsistent security configuration, java.security.cert.CertificateParsingException: PKIX: Unsupported OID in the AlgorithmIdentifier object: 1.2.840.113549.1.1.11> <Jul 21, 2010 12:49:49 PM GMT+05:30> <Emergency> <Security> <BEA-090034> <Not li stening for SSL, java.io.IOException: PKIX: Unsupported OID in the AlgorithmIden tifier object: 1.2.840.113549.1.1.11.> <Jul 21, 2010 12:49:49 PM GMT+05:30> <Error> <WebLogicServer> <BEA-000297> <Inco nsistent security configuration, java.security.cert.CertificateParsingException: PKIX: Unsupported OID in the AlgorithmIdentifier object: 1.2.840.113549.1.1.11>
27-07-2010