JDK-6896297 : rmid randomly crashes with ConcurrentModificationException when running JCK tests concurrently
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.rmi
  • Affected Version: 7
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2009-10-29
  • Updated: 2012-03-22
  • Resolved: 2011-07-19
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7
7 b142Fixed
Related Reports
Relates :  
Description
When running JCK 7 activation tests concurrently on a fast SPARC machine we noticed test failures due to rmid daemon unexpected shutdown caused by the ConcurrentModificationException exception.

Here is rmid output before it crashes:

===========
java.io.IOException: snapshot failed with exception of type: java.util.ConcurrentModificationException, message was: null
	at sun.rmi.log.ReliableLog.snapshot(ReliableLog.java:383)
	at sun.rmi.server.Activation.addLogRecord(Activation.java:1462)
	at sun.rmi.server.Activation.access$700(Activation.java:135)
	at sun.rmi.server.Activation$GroupEntry.registerObject(Activation.java:890)
	at sun.rmi.server.Activation$ActivationSystemImpl.registerObject(Activation.java:469)
	at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:613)
	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
	at sun.rmi.transport.Transport$1.run(Transport.java:177)
	at sun.rmi.transport.Transport$1.run(Transport.java:174)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:717)
===========

Here is the first failing test stack trace:

=======
java.rmi.activation.ActivationException: log snapshot failed; nested exception is: 
	java.io.IOException: snapshot failed with exception of type: java.util.ConcurrentModificationException, message was: null
	at sun.rmi.server.Activation.addLogRecord(Activation.java:1475)
	at sun.rmi.server.Activation.access$700(Activation.java:135)
	at sun.rmi.server.Activation$GroupEntry.registerObject(Activation.java:890)
	at sun.rmi.server.Activation$ActivationSystemImpl.registerObject(Activation.java:469)
	at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:613)
	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
	at sun.rmi.transport.Transport$1.run(Transport.java:177)
	at sun.rmi.transport.Transport$1.run(Transport.java:174)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:717)
	at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:273)
	at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:251)
	at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:160)
	at sun.rmi.server.Activation$ActivationSystemImpl_Stub.registerObject(Unknown Source)
	at java.rmi.activation.Activatable.exportObject(Activatable.java:427)
	at java.rmi.activation.Activatable.exportObject(Activatable.java:346)
	at java.rmi.activation.Activatable.<init>(Activatable.java:105)
	at javasoft.sqe.tests.api.java.rmi.activation.TestObject1Impl.<init>(TestObject1Impl.java:46)
	at javasoft.sqe.tests.api.java.rmi.activation.ActivationGroup.inactivate4Tests.ActivationGroup2006(inactivate4Tests.java:51)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:613)
	at javasoft.sqe.javatest.lib.MultiTest.invokeTestCase(MultiTest.java:406)
	at javasoft.sqe.tests.api.java.rmi.activation.ActivationGroupTest.invokeTestCase(ActivationGroupTest.java:66)
	at javasoft.sqe.javatest.lib.MultiTest.run(MultiTest.java:195)
	at javasoft.sqe.tests.api.java.rmi.activation.ActivationTest.run(ActivationTest.java:116)
	at javasoft.sqe.javatest.lib.MultiTest.run(MultiTest.java:127)
	at javasoft.sqe.tests.api.java.rmi.activation.ActivationGroup.inactivate4Tests.main(inactivate4Tests.java:38)
Caused by: java.io.IOException: snapshot failed with exception of type: java.util.ConcurrentModificationException, message was: null
	at sun.rmi.log.ReliableLog.snapshot(ReliableLog.java:383)
	at sun.rmi.server.Activation.addLogRecord(Activation.java:1462)
	at sun.rmi.server.Activation.access$700(Activation.java:135)
	at sun.rmi.server.Activation$GroupEntry.registerObject(Activation.java:890)
	at sun.rmi.server.Activation$ActivationSystemImpl.registerObject(Activation.java:469)
	at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:613)
	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
	at sun.rmi.transport.Transport$1.run(Transport.java:177)
	at sun.rmi.transport.Transport$1.run(Transport.java:174)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:717)
======
.JTR file and archived rmid logging directory are attached.

After the first test failure the rmid daemon crashes that cause other failures.

It's reproducible on mumble-new.sfbay one time per 2 - 10 JCK runs.

Here is a sample command for starting pre-configured JCK 7 test suite in loop:

ag153348@mumble-new$ for i in {0..9}; do echo $i; /java/re/jdk/7/latest/binaries/solaris-sparc/bin/java -jar /java/re/jck/7/qac/latest/binaries/JCK-runtime-7/lib/jtjck.jar -w:work$i api/java_rmi/activation; done
0
Oct 29, 2009 3:28:21 AM Harness starting test run with configuration "jck_runtime_solaris"...
Oct 29, 2009 3:29:05 AM Finished executing all tests, wait for cleanup...
Oct 29, 2009 3:29:05 AM Harness done with cleanup from test run.
Test results: passed: 134; failed: 20
Results written to /home/ag153348/testing/work0.
Error: Some tests did not pass

By default on mumble-new.sfbay it runs tests with concurrency=20.

Comments
EVALUATION The problem occurs because of unsafe sharing of Activation.groupTable across multiple threads. Common stack traces are appended below. Note: these are generated from a modified JDK with some debugging code added, so there will be stack frames that don't exist in the production JDK. However, the stack traces do run through the relevant code. It looks like the situation is as follows. Thread A accepts a request that ends up registering a group. This involves modifying groupTable, a HashMap which is a field of an Activation instance. The modification of groupTable is protected by a synchronization block on groupTable itself. Other accesses to groupTable (but not all, see below) are also protected by synchronization. Thread B accepts a request that registers an object. The act of doing so is logged, and periodically a log snapshot will be taken. A log snapshot involves serializing the Activation instance, which includes the groupTable HashMap. No synchronization is performed here. If the groupTable is modified while it's being serialized, this will result in ConcurrentModificationException. (Aside: not all access to groupTable is synchronized. The put() call in LogRegisterGroup.apply() is not synchronized, nor is the call to remove() in LogUnregisterGroup. These might be bugs as well.) The straightforward fix would be to change groupTable to be a ConcurrentHashMap instead of a HashMap. If this were done it might be possible to remove the external locking entirely. Unfortunately, ConcurrentHashMap has a different serialized representation from an ordinary HashMap, so this change would result in a likely incompatible change to anything that serializes an Activation instance. An alternative fix might be to add a writeObject() method to Activation, which locks groupTable before serializing it. This sounds like it might work, provided proper care is taken not to change the serialized representation of Activation. Stack traces follow. ========== THREAD A ========== java.lang.RuntimeException: potential concurrent modification at java.util.HashMap.checkConcurrentModification(HashMap.java:181) at java.util.HashMap.put(HashMap.java:408) at sun.rmi.server.Activation$ActivationSystemImpl.registerGroup(Activation.java:492) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:613) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) ========== THREAD B ========== java.io.IOException: snapshot failed at sun.rmi.log.ReliableLog.snapshot(ReliableLog.java:383) at sun.rmi.server.Activation.addLogRecord(Activation.java:1462) at sun.rmi.server.Activation.access$700(Activation.java:135) at sun.rmi.server.Activation$GroupEntry.registerObject(Activation.java:890) at sun.rmi.server.Activation$ActivationSystemImpl.registerObject(Activation.java:469) at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:613) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:818) at java.util.HashMap$EntryIterator.next(HashMap.java:859) at java.util.HashMap$EntryIterator.next(HashMap.java:857) at java.util.HashMap.writeObject(HashMap.java:1026) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:613) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:962) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1480) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346) at sun.rmi.log.LogHandler.snapshot(LogHandler.java:72) at sun.rmi.log.ReliableLog.snapshot(ReliableLog.java:379) ... 18 more
26-03-2011

EVALUATION I was able to reproduce this problem with my own JDK build and b51 of the JCK, on a 16-core Solaris x64 machine. It was somewhat more difficult to reproduce, however, failing in only 3 cases out of 100 JCK runs. Thanks for the command-line suggestion for running the JCK; this made things quite convenient. I'm still investigating the cause of this bug, though, and I'm not moving this to "Fix In Progress" because it's not yet clear whether a fix is feasible.
23-03-2011