United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-8005646 TEST_BUG: java/rmi/activation/ActivationSystem/unregisterGroup/UnregisterGroup leaves process running
JDK-8005646 : TEST_BUG: java/rmi/activation/ActivationSystem/unregisterGroup/UnregisterGroup leaves process running

Details
Type:
Bug
Submit Date:
2013-01-03
Status:
Closed
Updated Date:
2013-12-17
Project Name:
JDK
Resolved Date:
2013-01-23
Component:
core-libs
OS:
Sub-Component:
java.rmi
CPU:
Priority:
P4
Resolution:
Fixed
Affected Versions:
Fixed Versions:

Related Reports
Backport:
Backport:
Backport:
Backport:
Backport:
Relates:

Sub Tasks

Description
Subject: RMI java processes left running on JPRT systems
Date: Sat, 29 Dec 2012 12:51:33 -0800
From: Kelly O'Hair
To: core-libs-dev Libs <core-libs-dev@openjdk.java.net>

FYI...

After shutting down JPRT I found two systems with ActivationGroupInit java processes running.

I am assuming that a test case has fired them up and forgotten about them???

Not sure why JPRT did not kill them automatically...

-kto

jprtadm@sc11136053:~> jps -l -m
11651 sun.tools.jps.Jps -l -m
16530 sun.rmi.server.ActivationGroupInit
jprtadm@sc11136053:~> ps -fel | fgrep java
0 S jprtadm  11669 11603  0  80   0 -   511 pipe_w 12:35 pts/0    00:00:00 fgrep java
0 S jprtadm  16530     1  0  80   0 - 239212 futex_ Dec27 ?       00:01:02 /opt/jprt/T/P1/211212.chhegar/testproduct/linux_i586_2.6-product/jre/bin/java -Djava.security.manager=default -Djava.security.policy=/opt/jprt/T/P1/211212.chhegar/s/jdk/test/java/rmi/activation/ActivationSystem/unregisterGroup/group.security.policy -DunregisterGroup.port=53315 -Dtest.src=/opt/jprt/T/P1/211212.chhegar/s/jdk/test/java/rmi/activation/ActivationSystem/unregisterGroup -Dtest.classes=/opt/jprt/T/P1/211212.chhegar/s/jdk/build/linux-i586/testoutput/jdk_rmi/JTwork/classes/java/rmi/activation/ActivationSystem/unregisterGroup sun.rmi.server.ActivationGroupInit
jprtadm@sc11136053:~> uname -a
Linux sc11136053 2.6.27.25-78.2.56.fc9.i686 #1 SMP Thu Jun 18 12:47:50 EDT 2009 i686 i686 i386 GNU/Linux
jprtadm@sc11136053:~> 

                                    

Comments
The problem occurs because of race conditions in the test.

Besides the JVM running the test, there are two additional JVMs: RMID, and a JVM containing activated objects (the "group JVM"). The group JVM is the one that's hanging around. Some logic was added long ago in the fix for JDK-4213186 that attempts to deactivate all the objects in this group so that the group JVM will exit. This is done by calling the shutdown() method on each activated object. The shutdown() method in turn calls Activatable.inactive() to make the object inactive. The problem is that an object cannot be made inactive while a call is pending or in-progress on it, and the shutdown() method itself is in-progress.

So the object attempts to work around this by spawning another thread to do the deactivation, letting the in-progress call return. Since the spawned thread is now running asynchronously from the call to the active object, there is logic to call a callback in the test JVM to keep a count of the number of objects that have been deactivated. The test waits a while for the count to reach the expected number of objects.

Unfortunately, the freshly spawned deactivation thread is racing with the shutdown() call. That is, the shutdown() call might not have returned by the time the spawned thread attempts to do the deactivation. That should be OK, since the deactivation is done by ActivationLibrary.deactivate(), which has a backoff and retry algorithm in it.

The shutdown calls all return quickly, so the test immediately proceeds to wait for the count of deactivated objects to reach the desired number. Unfortunately, the callback occurs *before* the deactivation actually occurs, so the right number of callbacks can be made even though there are still threads waiting and retrying in ActivationLibrary.deactivate().

(I may be responsible for this as part of the fix for JDK-7186111. That fix moved the object deactivation after the callback, under the assumption that deactivation would set up a race between the VM exiting and the callback occurring. Had that happened, a callback might be dropped, and the test would fail. That doesn't seem plausible though, as the group JVM should continue running as long as there are non-daemon threads alive. But I could be wrong.)

In any case, we now have a case where the threads spawned by shutdown() in the group JVM are waiting and retrying their attempts to inactivate the object. The test, however, has now been told that all the objects have been deactivated, so it proceeds to shut down rmid. If rmid is shut down while the threads in the group JVM are retrying the call to Activatable.inactive(), this call will throw an exception because it can't connect to the activation system (which resides in rmid), and then it will give up. Thus, within the group JVM, the activated object will remain exported, which will cause the group JVM to hang around.

(By the way, this was really hard to debug. The problem is that the group JVM's stderr is consumed by RMID, or maybe the test. If the group JVM is hanging around and gets an error after rmid or the test have exited, its stderr goes nowhere. I had to hack it to open a file to which to send messages, which in turn required adjusting the security policy to allow opening of that file. Ugh.)
                                     
2013-01-03
The fix is to inactivate the activated object synchronously from within the shutdown() call instead of spawning a thread to do it asynchronously. This can be done by first unexporting the object forcibly. This must be done because the shutdown() call itself is an in-progress call on the active object itself. This is why Activatable.inactive() was failing and was moved to another thread in the first place. Forcibly unexporting it gets around this problem. Only then is the deactivation is done to notify rmid. This is done during the shutdown() call, so the test is blocked while this is happening, so we know it won't shut down rmid until after all the activated objects have been deactivated.

I'm pretty sure the group JVM won't exit until the in-progress call on a just-unexported object has completed. So, the test's call to shutdown() should always succeed. (Well, we'll see.)

Given that the test now deactivates the objects synchronously, we don't need the callback to tell the test how many objects have been deactivated. Thus we can remove the Callback interface and its implementation and the cleanup/wait code at the end. Since the callback object was registered in a registry created solely for that purpose, we don't need to create it, and thus we don't need to get a unique port for this registry, and we also don't need to pass this unique port number through a property through rmid to the group JVM. Thus, a lot of this extra infrastructure can simply be deleted.

Now, the group JVM is still somewhat "open loop" in that if the test fails unexpectedly somehow, it can still leave the group JVM running. If this happens, something to consider is to create another object in the group JVM and export it as a UnicastRemoteObject (instead of activating it). The test java/rmi/activation/Activatable/inactiveGroup/InactiveGroup.java uses this technique to sense the presence or absence of the group's JVM. For our case, we could use this as a "control channel" that unconditionally exits the group JVM (like justGoAway() would do). In fact, maybe we should consider just doing this instead of going through the machinations to inactivate all the activated objects.

Still another way to establish a control channel is to use a plain socket instead of RMI. This would provide a more positive verification of the group JVM's exit, since sockets are closed automatically when a process exits.

Other cleanups done for this test:
 - remove unused unregister() and justGoAway() calls from ActivateMe interface
 - RMI activation creates stubs dynamically, so the checked-in stubs can be removed
 - declare variables shared across threads to be volatile
 - adjust timeouts

                                     
2013-01-04
Empirically I've discovered that an in-progress call can run for an arbitrary amount of time after unexporting and inactivating itself. However, a thread spawned from such a call will not keep the JVM alive. So, with the old spawned-thread technique, there was a race condition between inactivating an object and the callback to the test to report the inactivation. The JVM will terminate quickly after the last object is inactivated, or as soon as the last call on any formerly active object finishes. This is the usual JVM-exits-after-last-nondaemon-thread-exits behavior.

The RMI service threads (and thus the thread spawned to perform the callback) are all daemon threads. The primary non-daemon thread keeping the JVM alive is the RMI Reaper thread, which stays alive essentially as long as there are objects exported. When the last exported object is unexported, the reaper exits, and so does the JVM. The old callback thread could have avoided the earlier race conditions by being made a non-daemon thread, so its execution would never be terminated by JVM death. This is now moot (so to speak) since we now unexport and inactivate the objects synchronously from within the shutdown() call.
                                     
2013-01-04
URL:   http://hg.openjdk.java.net/jdk8/tl/jdk/rev/c18f28312c49
User:  smarks
Date:  2013-01-23 02:33:43 +0000

                                     
2013-01-23
URL:   http://hg.openjdk.java.net/jdk8/jdk8/jdk/rev/c18f28312c49
User:  lana
Date:  2013-01-29 18:46:15 +0000

                                     
2013-01-29
verified it with latest JDK8 build, the issue was fixed
                                     
2013-08-20
Attached a couple log files from the test runs that resulted in the stray process. A line that occurs in these files that doesn't occur in a normal test run is:

Thu Dec 27 13:56:28 PST 2012:ExecGroup-0:err:ACTIVATION_LIBRARY: inactive trial failed. Sleeping 100 milliseconds before next trial

This might indicate that the objects aren't all getting deactivated, which would leave the process hanging around.
                                     
2013-01-03



Hardware and Software, Engineered to Work Together