Bug ID: JDK-4916766 CORBA COMM_FAILURE when destroy() takes too long and close() happens

JDK-4916766 : CORBA COMM_FAILURE when destroy() takes too long and close() happens

Type: Bug
Component: other-libs
Sub-Component: corba:orb
Affected Version: 1.4.1

Priority: P2
Status: Resolved
Resolution: Fixed
OS: solaris_8
CPU: generic

Submitted: 2003-09-03
Updated: 2004-05-10
Resolved: 2003-12-19

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

Other	Other
1.4.1 07Fixed	1.4.2Fixed

Related Reports

Relates :	JDK-4936203 - ORB threads should have their own unique threadgroup
Relates :	JDK-6660037 - Continued problems with upnext.com applet

Description

Test case and related files are in
/net/cores.east/cores/63693175

BEA logged the case and this is reproducible under their application server.
I have it in this directory as well in case it's needed.
server811_solaris32.bin

You can install this installer and choose the default install
location(/usr/local/bea).

After the installation create a directory user_projects under
/usr/local/bea/weblogic81/
create another directory under user_projects by name domains.

Under this domains directory extract the mydomain.zip file which is available in the above directory
/net/cores.east/cores/63693175

deploy the application 'fxtransact'

hit the applet in the browser as follows:
http://host:port/fxtransact/applet.html or
http://host:port/fxtransact/iiop-applet.html or
http://host:port/fxtransact/http-applet.html

4) you will a submit button in the browser. Hit the submit button wait for message that says applet started. And then hit the submit button again. You will see corba errors in java plugin console.

Caused by: org.omg.CORBA.COMM_FAILURE: vmcid: SUN minor code: 208 completed: Maybe

at com.sun.corba.se.internal.iiop.IIOPConnection.purge_calls(Unknown Source)
at com.sun.corba.se.internal.iiop.ReaderThread.run(Unknown Source)

On the applet refresh/reload it is trying to do two things at a time. One is calling destroy and the other is an event on killing the whole applet context and it's corresponding resources.
So, One thread is executing destroy() and the other thread is executing cleaning up AppContext. As part of cleaning up app context, it is also killing all the threads and thread groups and hence killing com.sun.corba.se.internal.iiop.ReaderThread. But at the same time, the other thread who is doing destroy() is trying to use the ReaderThread and operating
processInput() on IIOPConnection inside run() method of ReaderThread. Hence it got ThreadDeath. Hence, it is setting SystemException as COMM_FAILURE to the connection and finally it's been thrown from the destroy() method.

So, finallay based on various combinations we tried, looks like it is failing only in the case when both of these are happening at the same time. It doesn't fail if the destroy() or init() happens completely before or after ThreadDeath is issued.

Comments

CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: 1.4.1_07 1.4.2_04 generic FIXED IN: 1.4.1_07 1.4.2_04 INTEGRATED IN: 1.4.1_07 1.4.2_04 VERIFIED IN: 1.4.2_04
14-06-2004
WORK AROUND It appears that prior to destroy since destroy() is taking longer time to finish. And as part of closing AppContext it is also releasing the ClassLoader. And release classloader is going through all ThreadGroups and stoping all threads. Hence, as a result of this, it is stoping the ReaderThread that corresponds to ORB. On the other side, destroy() applet is closing the JMSProducer and JMSProducer is doing dispatchSync() in close(). But, by this time the reader thread is already stoped and hence getting COMM_FAILUREs. So, it works if we put 3 seconds delay in destroy() method.
11-06-2004
SUGGESTED FIX 1.4.1/j2se/src/share/classes/com/sun/corba/se/internal/iiop/ORB.java * 76,81 protected int transientServerId=0; // The thread group of the main thread (for applications) or applet. ! protected ThreadGroup threadGroup; protected ServiceContextRegistry scr ; --- 77,124 ---- protected int transientServerId=0; // The thread group of the main thread (for applications) or applet. ! // We make it package private from a security perspective. ! static ThreadGroup threadGroup; + // We intend to create new threads in a reliable thread group. + // This avoids problems if the application/applet + // creates a thread group, makes JavaIDL calls which create a new + // connection and ReaderThread, and then destroys the thread + // group. If our ReaderThreads were to be part of such destroyed thread + // group then it might get killed and cause other invoking threads + // sharing the same connection to get a non-restartable + // CommunicationFailure. We'd like to avoid that. + // + // Our solution is to create all of our threads in the highest thread + // group that we have access to, given our own security clearance. + // + static { + try { + // try to get a thread group that's as high in the threadgroup + // parent-child hierarchy, as we can get to. + // this will prevent an ORB thread created during applet-init from + // being killed when an applet dies. + threadGroup = (ThreadGroup) AccessController.doPrivileged( + new PrivilegedAction() { + public Object run() { + ThreadGroup tg, ptg; + tg = ptg = Thread.currentThread().getThreadGroup(); + try { + while (ptg != null) { + tg = ptg; + ptg = tg.getParent(); + } + } catch (SecurityException se) { + // Discontinue going higher on a security exception. + } + return new ThreadGroup(tg, "ORB ThreadGroup"); + } + } + ); + } catch (SecurityException e) { + // something wrong, we go back to the original code + threadGroup = Thread.currentThread().getThreadGroup(); + } + } + protected ServiceContextRegistry scr ; * 95,113 ** TaggedComponentFactories.registerFactories() ; - // - // We attempt to create new threads in this thread group, if - // possible. This avoids problems if the application/applet - // creates a thread group, makes JavaIDL calls which create a new - // connection and ReaderThread, and then destroys the thread - // group. If our ReaderThread were part of this destroyed thread - // group then it might get killed and cause other invoking threads - // sharing the same connection to get a non-restartable - // CommunicationFailure. We'd like to avoid that. - // - // Our solution is to create all of our threads in the same - // thread group that we were initialized under. - // - threadGroup = Thread.currentThread().getThreadGroup(); - // Compute transientServerId = (milliseconds since Jan 1, 1970)/10. // Note: transientServerId will wrap in about 2^32 / 8640000 = 497 days. --- 135,144 ---- 1.4.1/j2se/src/share/classes/com/sun/corba/se/internal/iiop/GIOPImpl.java * 239,248 **** --- 239,249 ---- try { ss = orb.getSocketFactory().createServerSocket(socketType, port); lis = (ListenerThread) AccessController.doPrivileged(new PrivilegedAction() { public java.lang.Object run() { ListenerThread thread = new ListenerThread(finalTable, + orb.threadGroup, ss, socketType); thread.setDaemon(true); return thread; }
11-06-2004
EVALUATION The root of the problem is Weblogic InitialContext caches and resues ORB reader threads, as Tao Ma suggested. So Weblogic InitialContext should not be created inside applet's thread group, such lifecycle is shorter than the reader threads. I developed a workaround that appears to fix the problem, by creating InitialContext in different thread group. init() { ... InitialContext adminContext = null; try { System.out.println("Getting new initial context"); // adminContext = new InitialContext(props); InitialContextThread t = new InitialContextThread(getInitialContextThreadGroup()); adminContext = t.getInitialContext(props); } catch (NamingException ne) { ne.printStackTrace(); } ...} private static final String INITIALCONTEXT_THREADGROUP = "InitialContextThreadGroup"; private ThreadGroup getInitialContextThreadGroup() { ThreadGroup tg = Thread.currentThread().getThreadGroup().getParent(); int count = tg.activeGroupCount(); ThreadGroup[] tgs = new ThreadGroup[count]; count = tg.enumerate(tgs); for(int index = 0; index < count; index ++) { if(INITIALCONTEXT_THREADGROUP.equals(tgs[index].getName())) return tgs[index]; } return new ThreadGroup(tg, INITIALCONTEXT_THREADGROUP); } class InitialContextThread extends Thread { private Properties props; private InitialContext initCtx; private NamingException ne; public InitialContextThread(ThreadGroup tg) { super(tg, "InitialContextCreationThread"); } public void run() { try { initCtx = new InitialContext(props); } catch(NamingException e) { this.ne = e; } } public InitialContext getInitialContext(Properties props) throws NamingException { this.props = props; initCtx = null; ne = null; this.start(); try { this.join(); } catch (InterruptedException e) { e.printStackTrace(); } if(this.ne != null) throw this.ne; return initCtx; } } ###@###.### 2003-09-18 ---------------------------------- ###@###.### 2003-11-19 Re-opening as a new fix was dis-covered for handling the 548145 escalation from BEA/BofA. Investigation with lots of help from Tao ma and ken, has helped in identifying the accidental unexpected death of the ReaderThread, and the ListenerThread as being the root cause of this behaviour. The changes in the code affect in ensuring that these threads get created in a threadgroup that is more persistant than the thread-group associated with the applet's threads, although it is the applet-activity that causes the creation of the ReaderThread, and the ListenerThread. Look at suggested fix for details. Need to be fixed in CORBA code. ###@###.### 2003-11-24
24-11-2003