JDK-8256811 : Delayed/missed jdwp class unloading events
  • Type: Bug
  • Component: core-svc
  • Sub-Component: debugger
  • Affected Version: 16
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2020-11-21
  • Updated: 2022-12-24
  • Resolved: 2022-07-22
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17 JDK 20
17.0.6Fixed 20 b08Fixed
Related Reports
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
JDK-8212879 moved ObjectFree event posting out of the GC pause. 

There is an "optimization" in jdwp that may cause some problems as a result. I think it may cause delays in jdwp ClassUnload events. Note that I've not seen any failures related to this; this is all based on code inspection.

The relevant code is in jdk.jdwp.agent/share/native/libjdwp:
eventHandler.c
classTrack.c

classTrack.c is where the ObjectFree handler resides. That handler records information which is made accessible by calling classTrack_processUnloads.

eventHandler.c has event_callback, which is the caller of classTrack_processUnloads.  That call is conditional on garbageCollected > 0, and when that's true it clears the counter.  event_callback is used as the core callback handler; many of its jvmti callback handlers use it.

garbageCollected is incremented by a jvmti GarbageCollectionFinish callback.

Garbage collectors post GarbageCollectionStart/Finish pairs at the start and end of garbage collection pauses.

If there is some jdwp event between the pause end and the ObjectFree postings, jdwp won't detect the unloading until after the next GC pause end.

I think this can be fixed by removing the conditionalization of the call to classTrack_processUnloads, and just do it unconditionally.  There is a (somewhat confusing) comment saying that the conditionalization is there to avoid taking the handler lock when that isn't needed. But it's not clear why that lock is being taken on that code path at all, as classTrack_processUnloads contains it's own internal locking.

There is also a cache cleanup in the same conditional, as it is doing cleanup of possibly cleared weak references, so no point in doing it again unless there's been some GC activity.

The reason I think this is a problem is that if the new ServiceThread support for posting ObjectFree events is disabled then we get some consistent failures in jdi tests related to class unloading. So there may be a timing issue for those tests; even with the ServiceThread doing the ObjectFree posting, those tests might now have intermittent failures, though I've not seen any in the testing I've done.

With the ServiceThread not posting ObjectFree events, the failing jdi tests are:

vmTestbase/nsk/jdi/ClassUnloadEvent/className/classname001/TestDescription.java
vmTestbase/nsk/jdi/ClassUnloadEvent/classSignature/signature001/TestDescription.java
vmTestbase/nsk/jdi/HiddenClass/events/events001.java

Comments
Fix request [17u] On behalf of Andrew Dinn: The failure to correctly report class unloading events both in the OpenJDK tests and in actual deployments is causing spurious failures for the eclipse team's testing regime. This initial patch and the follow up patches mentioned in the PR were adopted as remedies in upstream releases. Although they are still only an ad hoc fix, do in practice avoid the OpenJDK test failures and should produce the same results for other users. Some of the upstream patches were superseded by later ones in order to arrive at a satisfactory solution. However, rather than create a new patch to arrive at the same end result it seems best to backport them all in sequence. They all apply cleanly apart from the first one. One patch merely enables debug output in a relevant test. That has been retained, 1) to maintain patch compatibility with upstream, 2) to ensure follow-on patches apply cleanly, 3) because it will be useful if the ad hoc nature of the current fixes means further problems are encountered and 4) because its effect is minimal, being local only to the test and only resulting in a small amount more info in the test output file. The overall risk is low because these patches only modify the class unloading code. The changes have been tested by running the relevant class unloading unit tests. This does not guarantee the problem is fixed (because it is intermittent). However, it does indicate that the patches are not flawed.
27-10-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk17u-dev/pull/635 Date: 2022-08-17 13:24:49 +0000
17-08-2022

Changeset: 54854d93 Author: Zhengyu Gu <zgu@openjdk.org> Date: 2022-07-22 12:27:04 +0000 URL: https://git.openjdk.org/jdk/commit/54854d9300479c22c416fd9d2fdb5c29fc1884bc
22-07-2022

Thanks for the confirmation.
23-06-2022

Looks like I misread the spec, but there is still some grouping of CLASS_UNLOAD events done. According to the spec: Several events may occur at a given time in the target VM... These events are delivered together as a composite event... The events that are grouped in a composite event are restricted in the following ways: ... Only with other class unload events for the same class: Class Unload Event I didn't notice the "same class" part. From what I can see, it looks like if there are multiple CLASS_UNLOAD event handlers (multiple requests from JDI to be notified of CLASS_UNLOAD events), then multiple CLASS_UNLOAD events will be sent in the same composite event command. We start by doing the following: (void)bagEnumerateOver(unloadedSignatures, synthesizeUnloadEvent, (void *)env); unloadedSignatures is a bag of the signatures of all classes that were unloaded. synthesizeUnloadEvent will do the following for each signature (and for each CLASS_UNLOAD handler that is installed): eventHelper_recordClassUnload(node->handlerID, durableSignature, eventBag); This adds the event to the bag. Note for any given signature, that same eventBag is reused on each call, so for any given signature the bag ends up with a CLASS_UNLOAD event for each Handler. Then the following is done: reportEvents(env, eventSessionID, (jthread)NULL, 0, (jclass)NULL, (jmethodID)NULL, 0, eventBag); which calls: suspendPolicy = eventHelper_reportEvents(sessionID, completedBag); which calls: (void)bagEnumerateOver(eventBag, enumForCopyingSingles, &tracker); Which will add the each CLASS_UNLOAD event in the bag to the outgoing event packet. So in the end we do have grouping of CLASS_UNLOAD events, although only on a per signature basis, not multiple signatures in the same event command. However, I do think the spec requires this per signature grouping. So having thought this over some more, I don't believe you are doing anything that will change this behavior. Basically each ObjectFree event should trigger one composite event command that contains a CLASS_UNLOAD event for each even request that was made. It would be good to verify this on the JDI side both before and after your changes. The groupings should be done in each EventSet.
22-06-2022

Can you point me where is the "grouping" code resident?
22-06-2022

It's the "processing upon arrival" part that I worry about. It might mean a jdwp composite event packet is sent for each CLASS_UNLOAD event rather than grouping them all into a single composite event. If this isn't what you're doing then I don't see how your suggested change is actually fixing the issue in this CR. It might be a better approach for when/how the debug agent is notified of ObjectFree events, but that's not where the problem with this CR lies. The problem is once the debug agent is notified, how can we guarantee it sends the need CLASS_UNLOAD events in a timely manner (and in one composite event) without relying on some other event coming in to trigger their sending.
22-06-2022

Are you talking about synthesizeUnloadEvent()? I don't touch this part (jdi->jdwp). All I try to do here is to collect all free objects on service thread and post them from service thread in one thread state transition, so that they can be processed upon their arrival. I don't think I change any semantics how synthesizing unload event works.
22-06-2022

I'm not sure what your approach is leading to w.r.t. when/how the CLASS_UNLOAD events are sent, but it sounds like you are going to send a separate CLASS_UNLOAD event packet for each ObjectFree of a Class instance. This will cause a performance regression. Right now a CLASS_UNLOAD events are co-located (generally speaking, one jdwp event packet per GC that unloads any classes). Or at least this is the way it should work. I haven't gone in and verified it yet. I think I on the JDI side you just need to check if the events are all coming in as part of one EventSet, or if there is an EventSet per ClassUnload event.
22-06-2022

Yes, I saw the comment. I experimented and the I saw the issue: jdi callback uses jvmti_RawMonitor that requires thread _in_native state, so in JvmtiExport::post_object_free() method, we need to add JvmtiJavaThreadEventTransition, just as other events do. I am prototyping this approach, seems to work. I will do more test and cleanup, will post it soon. BTW, I am collecting dead object tag at safepoints with VM_JvmtiPostObjectFree, I am not sure it is actually needed, but can address that later. [~coleenp] Do you see any issues? Thanks
22-06-2022

> My understanding is that, ObjectFree event for unloaded classes can *not* be processed on the arrival, because it is posted on VM thread, right? The comment says: // PostObjectFree can't be called by JavaThread, so call it from the VM thread. void JvmtiTagMap::post_dead_objects_on_vm_thread() { I asked Coleen about this, and I believe she said she ran into some issue when posting on the JavaThread (she couldn't recall exactly what the issue was, but something to do with the VM state I believe) so she deferred to the VM thread. So I think your comment is not correct, and is in fact the opposite of what is now in place.
22-06-2022

My understanding is that, ObjectFree event for unloaded classes can *not* be processed on the arrival, because it is posted on VM thread, right? If so, we can solve the problem by collecting unloaded class tags at VM_JvmtiPostObjectFree safepoint and post them after it returns, then we can process ObjectFree event in event_callback handler.
21-06-2022

I'm pretty sure you are correct and JDK-8257705 predates JDK-8212879, or at least the potential for JDK-8257705 does. But JDK-8212879 does make things far worse because ObjectFree event for unloaded classes now comes in after the GarbageCollectionFinish event rather than before, meaning that the user might not see the class unload events until after the next garbage collection. I should also point out that the use ObjectFree to track the unloading of classes wasn't introduced until JDK-8255987, which was pushed just a little after this issue and JDK-8257705 were reported.
16-06-2022

I don't think JDK-8212879 is sole cause of delayed class unloading events, because cbGarbageCollectionFinish() only increases garbageCollected counter, but does not call into event_callback(), it still depends on other events to trigger class unloading processing. The only difference is that, before JDK-8212879, when event_callback() sees garbageCollected > 0, it guarantees to see all classes unloaded by the previous GC cycle, while it may not see all or even none of unloaded classes after JDK-8212879.
16-06-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/9168 Date: 2022-06-15 12:53:40 +0000
15-06-2022

Our custom reported missing ClassUnload events. After moving class unloading to concurrent phase, class unloading events now can overlap GC finish events, that can result jdi sees partial unloaded classes or even nothing when GC finish event arrives. Late arrival class unloading events will be flushed at next cycle, but for the last GC cycle, the events can be forever lost. I think doing the uncondtional call can resolve this problem, because other events, such as EI_THREAD_END or even EI_VM_DEATH can trigger the call.
14-06-2022

[I missed the question from @cplummer on 202102-01.] "Haven't all the ObjectFree events already happened by then, or do they all happen after?" They all happen after, now (JDK-8212879). They used to all happen before. That's the change driving this issue.
14-06-2022

> The reason I think this is a problem is that if the new ServiceThread support for posting ObjectFree events is disabled then we get some consistent failures in jdi tests related to class unloading. So there may be a timing issue for those tests; even with the ServiceThread doing the ObjectFree posting, those tests might now have intermittent failures, though I've not seen any in the testing I've done. See JDK-8257705. Specially the discussion starting here: https://bugs.openjdk.java.net/browse/JDK-8257705?focusedCommentId=14397483&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14397483 The problem with that test is that sometimes there are no events to deliver after the GC happens, so the test ends up waiting for the ClassUnload events, but those may never come unless something is done to trigger a JVMTI event. I think that can happen even without "some jdwp event between the pause end and the ObjectFree postings", although possibly for this test failure it is indeed because of the untimely event. I'm just saying you don't need the untimely event to make it so you never see ClassUnload events. Also, doing the unconditional call to classTrack_processUnloads() will not solve this problem, because you may not ever get an event to trigger the call.
02-02-2022

ILW=MMM=P3
02-02-2022

[~kbarrett] Can you explain what you mean by the following: > If there is some jdwp event between the pause end and the ObjectFree postings, jdwp won't detect the unloading until after the next GC pause end. "GC pause end" is when the GarbageCollectionFinish event is delivered, right? Haven't all the ObjectFree events already happened by then, or do they all happen after? If it is after, then possibly we can solve this issue by having the ObjectFree handler increment garbageCollected.
01-02-2021

> I think this can be fixed by removing the conditionalization of the call to classTrack_processUnloads, and just do it unconditionally. There is a (somewhat confusing) comment saying that the conditionalization is there to avoid taking the handler lock when that isn't needed. But it's not clear why that lock is being taken on that code path at all, as classTrack_processUnloads contains it's own internal locking. I think the comment is incorrect and I think your suggestion has way too much of a performance overhead. We can't call classTrack_processUnloads() on every event, and that's the probably the real reason for the conditional on garbageCollected.
01-02-2021

JDWP is one layer above JVM/TI in the JPDA stack. JDWP bugs live in core-svc/debugger so I'm moving this bug there.
21-11-2020

I put this in hotspot/jvmti because I couldn't find a better place for it, even though I think such a place probably exists.
21-11-2020