Bug ID: JDK-6445411 VMOperationsQueue isn't protected from concurrent modification during oop

Type: Bug
Component: hotspot
Sub-Component: runtime
Affected Version: 6

Priority: P3
Status: Resolved
Resolution: Not an Issue
OS: generic
CPU: generic

Submitted: 2006-06-30
Updated: 2019-05-22
Resolved: 2017-09-06

JDK 10
10Resolved

The VMOperationsQueue is protected from modification by the VMOperationsQueue_lock, but during GC iteration of oops, the queue's oops_do method is called to iterate through the queue, without the lock being held. This means the queue could be modified concurrently with the iteration. This could lead to corruption of the queue and/or the newly added operation being missed in the iteration.

The issue is theoretical. The existing VMops may not have a problem, but you could add one that does have a problem and nothing would tell you that. That was the discussion with Ramki looking a policies for VMops to ensure bad things can't happen.
06-09-2017
I don't think this is a bug. I added asserts where these two VM_Operations are enqueued and an assert in VMThread::oops_do() and ran the jvmti tests and tier1 tests with no failures. Closing as NAI. diff --git a/src/share/vm/prims/jvmtiImpl.cpp b/src/share/vm/prims/jvmtiImpl.cpp --- a/src/share/vm/prims/jvmtiImpl.cpp +++ b/src/share/vm/prims/jvmtiImpl.cpp @@ -432,6 +432,7 @@ if ( _bps.find(bp) != -1) { return JVMTI_ERROR_DUPLICATE; } + assert(!SafepointSynchronize::is_at_safepoint(), "cannot add VM_Operations with oops at safepoint"); VM_ChangeBreakpoints set_breakpoint(VM_ChangeBreakpoints::SET_BREAKPOINT, &bp); VMThread::execute(&set_breakpoint); return JVMTI_ERROR_NONE; @@ -442,6 +443,7 @@ return JVMTI_ERROR_NOT_FOUND; } + assert(!SafepointSynchronize::is_at_safepoint(), "cannot add VM_Operations with oops at safepoint"); VM_ChangeBreakpoints clear_breakpoint(VM_ChangeBreakpoints::CLEAR_BREAKPOINT, &bp); VMThread::execute(&clear_breakpoint); return JVMTI_ERROR_NONE; diff --git a/src/share/vm/runtime/thread.cpp b/src/share/vm/runtime/thread.cpp --- a/src/share/vm/runtime/thread.cpp +++ b/src/share/vm/runtime/thread.cpp @@ -436,6 +436,7 @@ // Enqueue a VM_Operation to do the job for us - sometime later void Thread::send_async_exception(oop java_thread, oop java_throwable) { + assert(!SafepointSynchronize::is_at_safepoint(), "cannot add VM_Operations with oops at safepoint"); VM_ThreadStop* vm_stop = new VM_ThreadStop(java_thread, java_throwable); VMThread::execute(vm_stop); } diff --git a/src/share/vm/runtime/vmThread.cpp b/src/share/vm/runtime/vmThread.cpp --- a/src/share/vm/runtime/vmThread.cpp +++ b/src/share/vm/runtime/vmThread.cpp @@ -674,6 +674,7 @@ void VMThread::oops_do(OopClosure* f, CodeBlobClosure* cf) { + assert(SafepointSynchronize::is_at_safepoint(), "oops_do is only called at a safepoint"); Thread::oops_do(f, cf); _vm_queue->oops_do(f); }
06-09-2017
It may be there is no longer any potential issue. The concern was a VMop with an oops_do could be added concurrently to the queue such that it was not seen by oops_do processing and so its oop was not processed.
06-09-2017
I'm confused about this bug. The only two VM_Operations that have oops are VM_ThreadStop and VM_ChangeBreakpoints. Neither of these add their VM_Operation to the queue during a safepoint. The VMOperationsQueue_lock doesn't check for safepoint so there isn't an intervening safepoint while they are being enqueued. The VMThreads::oops_do() that follows the vm_queue list is only called during a safepoint. So I don't see how oops can leak out.
05-09-2017
The "JPRT crasher" has now been tracked down and is unrelated to this issue.
31-07-2017
We were trying to figure out how to remove that last oops do (after FlatProfiler is removed), but didn't come up with a solution within the hour that we were looking at it. Maybe we can spend more time to do that.
28-07-2017
Wonder why it would start crashing now, when this code's been there for a long time? Feel free to reassign this from me.
28-07-2017
Ioi, I'm chasing the JPRT crasher with JDK-8185273 which is a P1. It's looking like the parallel oops_do() is turning out to be the culprit. I have a test run with 90+ runs with a fix in place and no crashes. Patience... Update: And it crashed at runs 99, 104, 109, 122... sigh... So Erik's idea about adding locking in JDK-8185345 is not a foolproof answer. Makes the crash less frequent, but the race is still there.
27-07-2017
Changed to P3 since this is potentially the reason for recent crashes in JPRT. If possible, we should fix this to at least eliminate a potential root cause.
27-07-2017
I assigned it to myself because there's only one vm operation with an oops_do left, which [~rehn] and I looked into, but I think we still need that oops do for the exception oop. So I want to see if it is still a problem (and to understand the problem which I didn't do at first read).
06-04-2017
EVALUATION As we always add-at-end, and always traverse from start-to-end, with the current usage the worst that will happen is that a newly added operation won't be seen. But that operation can't rely on oops_do being applied to it anyway.
28-07-2006