Bug ID: JDK-6666561 Multithreading issue while making a Remote EJB Method Call using JDK 1.6.0

Type: Bug
Component: other-libs
Sub-Component: corba:orb
Affected Version: 6u10

Priority: P1
Status: Closed
Resolution: Duplicate
OS: solaris_1
CPU: sparc

Submitted: 2008-02-22
Updated: 2011-02-16
Resolved: 2009-01-21

JDK 6
6u12Resolved

We are trying to do a EJBLookup and then create an EJBHome and thereafter invoke Method calls on a remote EJB Deployment.
Please see the attached code "WorkflowClient_Caching.java" for details.
Consider that the EJB is deployed on Server A running under Sun Java System Application Server 8.1. The client that we have written looks up the EJB, and calls create and then invokes Method calls.
Please find the various scenarios below:
1. When we did a lookup multiple times in a single threaded application, the Server A used to hang thereby giving us a error on the client side. As a result of which we introduced caching the EJBHome object.
2. After caching, the EJB Looks up work find and have been tested on JDK 1.5.0_10 and are working on Solaris as well as Linux.
3. The method call to the EJB gives a problem when run under JDK 1.6 in a multi-threaded environment.
We carried out the following test cases:
a. Multithreaded client with JDK 1.5 worked fine.
b. Single Threaded Client with JDK 1.5 worked fine.
c. Single Threaded Client with JDK 1.6(the JDK with which portal code is compiled and bundled on WINDOWS), Does work
d. MultiThreaded Client with JDK 1.6 on Windows DOES NOT WORK, hangs.

The fact that the EJB Method calls do execute when run under 1.5 seems to remove the possibility that there might be a problem with Server 'A'. Had it been the case, even the calls would have failed under JDK 1.5

The problem seems to be with:

JDK 1.6 irrespective of platform(Solaris or Windows), hangs when a EJB Method call is invoked.

1. .lookup
2. create
3. .checkOutTasks

if these 3 are the methods called, then by uncommenting each one of them sequentially, the program execution hangs when .checkOutTasks is uncommented. which does not hang under JDK1.5

EVALUATION Marking as a duplicate of 6725987.
21-01-2009
EVALUATION The problem appears to be caused by incorrect synchronization on the ORBImpl.destroy method, which should NOT be synchronized. Instead, it needs to access the status variable only while holding the ORB lock, then drop the lock before calling shutdoown. The deadlock is caused by the other invocation threads that are stuck on getLocalHostName, which previously incremented numInvocations in the ORB, causing the shutdown to block waiting for the invocations to complete, which can't, because they cannot get the ORB lock. This should already be fixed as a result of the fixes for bug 6725987, which was about memory leaks on ORB destroy/shutdown. The changes for that fix also include removing the incorrect synchronization from the destroy method. That fix should be available in 5.0u20 and 6u12, so please check with those versions to see if the problem is fixed. I have no idea what build of 6u12 will contain this fix, so I'll leave this as fix in progress, commit to fix in build 10 (the latest available).
29-10-2008
EVALUATION In the Solaris jdk1.6_10 trace and Windows trace all application threads are waiting for a response: "Thread-2" prio=3 tid=0x08189800 nid=0x11 in Object.wait() [0xb6577000..0xb65778e0] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xf3e884f8> (a java.lang.Object) at java.lang.Object.wait(Object.java:485) at com.sun.corba.se.impl.transport.CorbaResponseWaitingRoomImpl.waitForResponse(CorbaResponseWaitingRoomImpl.java:140) - locked <0xf3e884f8> (a java.lang.Object) There is no way to tell why the response has not turned up. The Solaris JDK 1.6.0_01 trace clearly shows a deadlock in the ORB code. A finalizer for a COSNaming object is trying to shutdown the ORB and holds a number of locks while doing an Object.wait() on one of them. Meanwhile the application threads are all blocked trying to get that lock when doing a look-up. Presumably the first thread is waiting for these other threads. This is clearly something for the CORBA/ORB folk to investigate - but I can't find the relevant bugster category. I did find a JNDI CosNaming bug that could be related.
27-02-2008
EVALUATION Thank you everyone for their inputs. As I indicated in Step 3 in my bug report: "3. The method call to the EJB gives a problem when run under JDK 1.6 in a multi-threaded environment." it hangs in "wfs.getTasksForUser("CPina", null)". The call does go through for some execution, after which it hangs. There is also a change in AppServer Version. the version on which it was found working was b58g AppServer 9.1 and it stopped working on AppServer 9.1 U1. But as I indicated earlier, it works on AppServer 9.1 U1, JDK 1.5.
26-02-2008
EVALUATION I too feel that a thread dump will provide more info this. Can the subnitter give more info on where it "hangs". Is it in the home.create() or in the business method call wfs.getTasksForUser("CPina", null).size()? Also, is this a statless OR stateful OR an entity bean? The EJB container doesn't do any locking. It could most likely be an ORB issue as well.
25-02-2008
EVALUATION Taking a quick look at the code the synchronization is not correct in the caching version: static WorkflowServiceHome home = null; ... try { if(home == null){ synchronized (object) { if(home == null){ Context ctx = new InitialContext(env); System.out.println("home is null..so look up..."); Object obj = ctx.lookup("WorkflowService"); home = (WorkflowServiceHome) PortableRemoteObject.narrow(obj, WorkflowServiceHome.class); System.out.println("new Home object is: "+ home); } } }else{ System.out.println("home object exists ...so no need to look up. "); } This is an example of the broken "double-checked locking pattern". For it to be correct in Java (5+) the "home" variable must be declared volatile. Otherwise the "home" object could be seen in one thread in an incompletely initialized state. Whether this is the problem or not I can't say. The rest of the code sheds no light on the potential problem.
25-02-2008
EVALUATION > The fact that the EJB Method calls do execute when run under 1.5 seems to remove > the possibility that there might be a problem with Server 'A'. Had it been the > case, even the calls would have failed under JDK 1.5 This is not a valid assertion. Race conditions can manifest under different circumstances. The code can be incorrect and yet not fail on a particular platform or a particular version of the VM - that doesn't make the code correct. So a failure under JDK 6 that does not manifest under JDK 5 does not in itself indicate a problem with JDK 6. Can you provide Java-level thread dumps and pstack output for the "hung" JVMs? I'm not in a position to run an EJB server or deploy an EJB (even if I knew how) and I'm pretty certain Martin isn't either. Without basic stack information there is no way to tell whether this is an issue in the application/EJB, the server, the JDK lib or the hotspot VM - and even with basic stack information there's no guarantee the problem will be obvious.
25-02-2008
EVALUATION I have attached three test files which can be used to test this bug. A Simple Test Case can be: 1. Deploy EJB on a Server 'A'. 2. Write a stand alone java client(Single Threaded and MultiThreaded). See the attached files for example 3. Use JDK 1.5 or JDK 1.6 to check results 4. Create InitialContext 5. Perform Lookup 6. Create EJB 7. Make remote call. The lookup can be cached as indicated in the example. It may not be cached as well. We can observe that in a multithreaded environment this fails. I hope this is the information you were looking for.
25-02-2008
EVALUATION I'm adding some email addresses from sunone_application_server land to the interest list; perhaps they can help find a better home for this bug.
22-02-2008