Bug ID: JDK-8240902 JDI shared memory connector can use already closed Handles

JDK 15
15 b16Fixed

URL: https://hg.openjdk.java.net/jdk/jdk/rev/0c03ed579379 User: pchilanomate Date: 2020-03-20 00:38:54 +0000
20-03-2020
Right, I'll add that in the comments. If we want to attempt to free the fields from SharedMemoryConnection even after the connection was published, we could define a new struct X and move all those fields there except the refcount and state, and then have a pointer to a dynamically allocated X from SharedMemoryConnection. Then if the refcount is 0 we could also free X. But probably is not worth doing that to save 140 - 12 = 128 bytes.
17-03-2020
Your approach looks good. The fact we cannot actually delete the connection after it has been successfully created and published, is convenient in that it allows the reference count to be internal to the connection object. In general such reference counting has to exist externally to the object being managed. It is probably worth noting that somewhere. 967 /* 968 * Ideally we should free the connection structure. However as 969 * closing the connection is asynchronous it means that other 970 * threads may still be accessing the connection structure. On 971 * Win32 this means we leak 132 bytes. This memory will be 972 * reclaimed at process exit. 973 * 974 * freeConnection(connection); 975 * 976 */ With the additional state it's probably more than 132 bytes now.
16-03-2020
The issues mentioned so far on closing mutex, hasData and hasSpace handles from the Stream struct, apply also for the shutdown and otherProcess handles from the SharedMemoryConnection struct. There are also calls to closeConnection from createConnection/openConnection and to closeStream from createStream/openStream, when failing to create/open a resource. In some of these cases calling closeConnection or closeStream will attempt to release a resource that was never created. Testing out patch that aims to fix all these issue. It uses a refcount on connection access: http://cr.openjdk.java.net/~pchilanomate/8240902/refcount/webrev/
16-03-2020
Moving from hotspot/svc -> core-svc/debugger since that's where JPDA transport layer bugs live.
12-03-2020
It's a pretty similar scenario to that one but not completely identical; but yes thats the idea with the change of order. The scenario I found was the following: - threadA is in receiveBytes() -> waitForData() -> sysEventWait() waiting on connection->otherProcess (checks if the other process died) and stream->hasData - thread B closes the connection which: - signals connection->shutdown - closes the stream. which in turn - signals stream->hasData - closes the handle to the mutex which deletes it - thread C creates a new synchronization object of some form (initially signalled) and gets back the recycled handle previously assigned to the mutex - thread A wakes up inside sysEventWait because of the signal on stream->hasData. Calls enterMutex() which examines the "mutex" handle and connection->shutdown handle. Since the "mutex" handle is checked first, it sees the synchronization object associated with that handle is in the signaled state and consumes that signal. - thread C tries to use the new synch object that should be signalled and instead blocks because it is no longer signalled I think the change of handle order applies only to this scenario but not yours. In your scenario the question is whether we can assume that after signaling connection->shutdown, threadA already "waked up", so the wait is not in progress anymore and we can close the handle. But with your threadD example that seems not to be the case. It appears that threadD could experience both issues. If the handle was recycled, then when threadD starts executing again, it will consume the signal of the new object as you pointed out. If the handle is not recycled, we would fall in undefined behaviour since threadD will check an already closed handle inside WFMO, which is stated as undefined behaviour. Now to add more to this mess, if the logic on the previous sentences is correct, I realized we are also falling into this scenario with hasData (hasData is signalled and closed while there is somebody waiting on it).
12-03-2020
But suppose that a thread D calls WFMO which checks the shutdown event is then preempted before checking the mutex. We then go through the scenario above where the mutex gets deleted and the handle reassigned. Thread D then continues to check the "mutex" and again consumes the signal.
12-03-2020
I see what you are saying about the order, but I would still be reluctant to rely on that to make this situation completely safe. If I understand the scenario correctly: - thread A is in WaitForMultipleObjects on stream->mutex and connection->shutdown events - thread B closes the connection which: - signals connection->shutdown - closes the stream. which in turn - closes the handle to the mutex which deletes it - thread C creates a new synchronization object of some form (initially signalled) and gets back the recycled handle previously assigned to the mutex - thread A wakes up inside WFMO because of the signal on connection->shutdown and examines the "mutex" handle, it sees the synchronization object associated with that handle is in the signaled state and consumes that signal. - thread C tries to use the new synch object that should be signalled and instead blocks because it is no longer signalled If we change the order of the events thread A will see the shutdown event is signalled and not examine the "mutex" and so not falsely consume a signal on the "mutex".
12-03-2020
The real bug is in closeStream: static jint closeStream(Stream stream, jboolean linger) { / * Lock stream during close - ignore shutdown event as we are * closing down and shutdown should be signalled. / CHECK_ERROR(enterMutex(stream, NULL)); / mark the stream as closed / stream->state = STATE_CLOSED; / wake up waitForData() if it is in sysEventWait() / sysEventSignal(stream->hasData); sysEventClose(stream->hasData); / wake up waitForSpace() if it is in sysEventWait() */ sysEventSignal(stream->hasSpace); sysEventClose(stream->hasSpace); ... CHECK_ERROR(leaveMutex(stream)); sysIPMutexClose(stream->mutex); <= BUG HERE! return SYS_OK; } You cannot close/delete a synchronization object without knowing for certain that all operations on that object are 100% complete. If a thread tries to acquire the stream mutex just after it is acquired in closeStream then it is still waiting for that mutex when the above code potentially deletes it! Unfortunately there is no easy way to know when it is safe to delete a synchronization object. You need to track usage at a higher-level, which in turn requires its own synchronization. Fortunately Windows already provides this in a way: a possible solution in this case may be to create a new handle for the mutex before using it, and closing the handle afterwards. That way if the original handle is closed by closeStream, any active users of the mutex will not be affected as the mutex won't be deleted until the local handle is closed. However the handle will have to exist across the paired mutex enter and leave functions, otherwise the mutex might get deleted after being locked.
12-03-2020
Thanks [~dholmes]. The reason I thought about changing the order of the handles is because of this sentence in the Remarks section in https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitformultipleobjects: _"When bWaitAll is FALSE, this function checks the handles in the array in order starting with index 0, until one of the objects is signaled."_ Now, there is also this sentence at the Parameters section: _"If one of these handles is closed while the wait is still pending, the function's behavior is undefined."_ But even if the handle is closed, based on the former statement the check should bail after checking the shutdown event, since we know it will be signaled.
12-03-2020
This is a great find! It shows just how tricky it is to delete synchronization objects. There is no guarantee that the order of handles affects the operation of WaitForMultipleObjects. Without going into a full examination and potential re-design of this code we may just have to leak the mutex handles as we already do with other structures: /* * Ideally we should close the connection->shutdown event and * free the connection structure. However as closing the * connection is asynchronous it means that other threads may * still be accessing the connection structure. On Win32 this * means we leak 132 bytes and one event per connection. This * memory will be reclaim at process exit. * * if (connection->shutdown) * sysEventClose(connection->shutdown); * freeConnection(connection); */
12-03-2020
A simple fix could be to invert the order of the handles when calling WaitForMultipleObjects() in sysIPMutexEnter(). The shutdown event should get priority over the mutex. That way if the connection is closed, the call to WaitForMultipleObjects() will just see that the shutdown event was signaled and will not affect the previously closed Handle in case it was recycled and assigned to a new object.
12-03-2020
The issue can be reproduced by adding a semaphore in the HandshakeState class and running test test/langtools/jdk/jshell/FailOverExecutionControlTest.java. The test will intermittently timeout due to a deadlock in the handshake code. Alternatively the following patch can be applied which will fail on the previous test with an assertion after semaphore _semaphore(of HandshakeState class) is created: http://cr.openjdk.java.net/~pchilanomate/8240902/triggerbug/webrev/ This patch adds output to the previous one to visualize the issue with the handles: http://cr.openjdk.java.net/~pchilanomate/8240902/triggerbug_withoutput/webrev/
12-03-2020
And presumably hasSpace as well. You cannot "signal" a synchronization object and then delete it.
12-03-2020
This issue was spotted by just adding an extra dummy semaphore in HandshakeState class. The spurious call to WaitForMultipleObjects() after the connection was closed, uses a Handle id that previously identified a mutex (used to serialize reads/writes to one stream of the connection) but now was assigned to a newly created semaphore in HandshakeState. The call consumes the signaled state of the semaphore. There is a later call to ReleaseMutex(), after calling WaitForMultipleObjects() and checking that the stream was closed, which might look like it restores back the state of the semaphore. This will be the case if the new object would also be a mutex. Since the object is a semaphore the call fails. From https://docs.microsoft.com/en-us/previous-versions/ms919166(v=msdn.10)?redirectedfrom=MSDN, "The ReleaseMutex function fails if the calling thread does not own the mutex object.".
11-03-2020

Blocks :	JDK-8240918 - [REDO] Allow direct handshakes without VMThread intervention
Relates :	JDK-8230594 - Allow direct handshakes without VMThread intervention
Relates :	JDK-8237756 - [BACKOUT]: JDK-8230594: Allow direct handshakes without VMThread intervention