JDK-8292879 : com/sun/jdi/ClassUnloadEventTest.java failed due to classes not unloading
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 20
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: windows
  • Submitted: 2022-08-24
  • Updated: 2022-11-01
  • Resolved: 2022-10-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17 JDK 20
17.0.6Fixed 20 b19Fixed
Related Reports
Relates :  
Relates :  
Sub Tasks
JDK-8292880 :  
Description
The test is failing with:

 > java.lang.RuntimeException: Wrong number of class unload events: expected 10 got 0 

Note this is different than JDK-8291456, which fails with:

 > java.lang.RuntimeException: Wrong number of class unload events: expected 10 got 4

com/sun/jdi/ClassUnloadEventTest.java is a newer tests that is testing the generating of JDI ClassUnloadEvents. There were some rare cases of it failing to properly generate the events. JDK-8291456 was filed, and it was determined that the failure was due to the debuggee exiting before jvmti could flush out all the queued up events. This was worked around in the test with a short delay before the debuggee exits. However, the test is still failing, although it is much rarer now. I did try extending the delay before the debuggee exits to 60 seconds, and it still fails, so clearly it's not the same issue as JDK-8291456.

Some unique attributes of this failure when compared to JDK-8291456 include:

1. Always on windows-x64-debug. Not on any other platforms (unlike JDK-8291456).
2. Only with -Xcomp plus some other compiler related options that may or may not also have an impact.
3. The test complains that it got 0 of 10 expected events, whereas for JDK-8291456 is was always 4 of 10
4. Added delay in the debuggee does not fix the problem.

I made some changes to the test to print the output of the debuggee and also turned on -Xlog:class+unload=info, and indeed the test classes are not being unloaded, so this does not appear to be a JVMTI or debug agent issue.

The test uses ClassUnloadCommon.triggerUnloading(), just like many other tests do. I assume this is reliable, but maybe it is not for some reason with this particular test case.

The other possible cause I thought of that is in rare cases the debuggee is compiled in a way that results in a reference to the ClassLoader or a Class being retained.
Comments
A pull request was submitted for review. URL: https://git.openjdk.org/jdk17u-dev/pull/831 Date: 2022-10-25 09:54:27 +0000
25-10-2022

Changeset: 6a266c4d Author: Chris Plummer <cjplummer@openjdk.org> Date: 2022-10-07 19:11:19 +0000 URL: https://git.openjdk.org/jdk/commit/6a266c4de15d04d50928b68f9b2dbb69911cff13
07-10-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/10519 Date: 2022-10-01 01:12:52 +0000
01-10-2022

[~coleenp] Thanks for uncovering the Class reference. After putting some more thought into what the test was doing, it's started to makes sense why sometimes classes are not unloading. One detail Coleen left out above, but mentioned to me in private chat, is she found where the reference to the Class was being allocated. Here's the stack trace: V [libjvm.so+0xd5f1ad] JNIHandles::make_global(Handle, AllocFailStrategy::AllocFailEnum)+0x1f9 (jniHandles.cpp:98) V [libjvm.so+0xd1fc95] jni_NewGlobalRef+0xf5 (jni.cpp:681) C [libjdwp.so+0x39c71] saveGlobalRef+0x27e (util.c:65) C [libjdwp.so+0x26457] saveEventInfoRefs+0x115 (eventHelper.c:846) C [libjdwp.so+0x26d02] eventHelper_recordEvent+0x10f (eventHelper.c:1074) C [libjdwp.so+0x2dbe5] handleClassPrepare+0x84 (standardHandlers.c:75) C [libjdwp.so+0x1ebe5] filterAndHandleEvent+0x15d (eventHandler.c:566) C [libjdwp.so+0x1eee4] event_callback+0x261 (eventHandler.c:666) C [libjdwp.so+0x2097e] cbClassPrepare+0x142 (eventHandler.c:920) V [libjvm.so+0xebef9a] JvmtiExport::post_class_prepare(JavaThread*, Klass*)+0x304 (jvmtiExport.cpp:1396) V [libjvm.so+0xc28adf] InstanceKlass::link_class_impl(JavaThread*)+0x69d (instanceKlass.cpp:915) This shows that the reference is a jclass that is created for a ClassPrepareEvent, presumably that is being generated for the last class that the test loads. The debug agent sends the event to the debugger using a different thread than the main debuggee thread. However, it blocks the main debuggee thread until the event is sent. Once sent, it unblocks the debuggee thread and then proceeds to do some cleanup. Part of the cleanup is freeing up a jclass reference to the class by calling DeleteGlobalRef on it. However, before this is done the debuggee main thread resumes and starts a full GC. This causes the debug agent thread to block when it finally gets around to calling DeleteGlobalRef, since this cannot be done while doing a full GC. Coleen showed the thread blocking with the following stack trace: V [libjvm.so+0xd5f63a] JNIHandles::destroy_global(_jobject*)+0x124 (jniHandles.cpp:164) V [libjvm.so+0xd1fd90] jni_DeleteGlobalRef+0xba (jni.cpp:691) C [libjdwp.so+0x39f17] tossGlobalRef+0x206 (util.c:88) C [libjdwp.so+0x26779] tossEventInfoRefs+0x67 (eventHelper.c:906) C [libjdwp.so+0x2541e] handleEventCommandSingle+0x291 (eventHelper.c:460) C [libjdwp.so+0x2588c] handleReportEventCompositeCommand+0x203 (eventHelper.c:553) C [libjdwp.so+0x25b78] handleCommand+0x60 (eventHelper.c:624) C [libjdwp.so+0x25e83] commandLoop+0xe4 (eventHelper.c:723) V [libjvm.so+0xecd8b4] JvmtiAgentThread::call_start_function()+0x68 (jvmtiImpl.cpp:89) V [libjvm.so+0xecd849] JvmtiAgentThread::start_function_wrapper(JavaThread*, JavaThread*)+0x73 (jvmtiImpl.cpp:83) V [libjvm.so+0xc7cacf] JavaThread::thread_main_inner()+0x16f (javaThread.cpp:700) V [libjvm.so+0xc7c95c] JavaThread::run()+0x182 (javaThread.cpp:683) V [libjvm.so+0x134bd71] Thread::call_run()+0x195 (thread.cpp:224) V [libjvm.so+0x112d3cb] thread_native_entry(Thread*)+0x19b (os_linux.cpp:710) Once the full GC completes, the DeleteGlobalRef can complete, but by then it's too late. The test is done with the GC and expects all classes to be unloaded, but they are not because this one jclass kept them all alive. Previously I had tried doing two consecutive full GCs, and this did not help. Although technically the debug agent is unblocked between the two GC calls, and could have completed the DeleteGlobalRef, this requires it to get a time slice in the very small window after the end of the first GC and before the start of the second GC, and I guess that just wasn't happening. What does seem to work is a small sleep before the first (and only) GC. This gives the debug agent plenty of time to finish up with the last ClassPrepareEvent before the debuggee triggers the full GC.
30-09-2022

The debuggee creates a JNI Handle for the Class Objects for the classes that it's supposed to be unloading, but the thread that's supposed to destroy the JNI Handle might be starved out on this machine, so the classes are always strongly referenced, so never gets unloaded. This is a stack trace of the thread that's supposed to destroy the global jni handle, but this is not how it looks in in windbg. Stack: [0x00007f534df8d000,0x00007f534e08e000], sp=0x00007f534e08c9c0, free space=1022k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xd5f63a] JNIHandles::destroy_global(_jobject*)+0x124 (jniHandles.cpp:164) V [libjvm.so+0xd1fd90] jni_DeleteGlobalRef+0xba (jni.cpp:691) C [libjdwp.so+0x39f17] tossGlobalRef+0x206 (util.c:88) C [libjdwp.so+0x26779] tossEventInfoRefs+0x67 (eventHelper.c:906) C [libjdwp.so+0x2541e] handleEventCommandSingle+0x291 (eventHelper.c:460) C [libjdwp.so+0x2588c] handleReportEventCompositeCommand+0x203 (eventHelper.c:553) C [libjdwp.so+0x25b78] handleCommand+0x60 (eventHelper.c:624) C [libjdwp.so+0x25e83] commandLoop+0xe4 (eventHelper.c:723) V [libjvm.so+0xecd8b4] JvmtiAgentThread::call_start_function()+0x68 (jvmtiImpl.cpp:89) V [libjvm.so+0xecd849] JvmtiAgentThread::start_function_wrapper(JavaThread*, JavaThread*)+0x73 (jvmtiImpl.cpp:83) V [libjvm.so+0xc7cacf] JavaThread::thread_main_inner()+0x16f (javaThread.cpp:700) V [libjvm.so+0xc7c95c] JavaThread::run()+0x182 (javaThread.cpp:683) V [libjvm.so+0x134bd71] Thread::call_run()+0x195 (thread.cpp:224) V [libjvm.so+0x112d3cb] thread_native_entry(Thread*)+0x19b (os_linux.cpp:710)
29-09-2022

I put a ShouldNotReachHere() in SerialGC MarkSweep and got a similar crash (with more information in the windows debugger); V [jvm.dll+0x582ee5] report_should_not_reach_here+0x35 (debug.cpp:334) V [jvm.dll+0xb65b5a] MarkSweep::mark_and_push<enum narrowOop>+0x16a (markSweep.inline.hpp:81) V [jvm.dll+0xb67628] InstanceMirrorKlass::oop_oop_iterate<enum narrowOop,MarkAndPushClosure>+0x138 (instanceMirrorKlass.inline.hpp:51) V [jvm.dll+0xb66778] OopOopIterateDispatch<MarkAndPushClosure>::Table::oop_oop_iterate<InstanceMirrorKlass,enum narrowOop>+0x48 (iterator.inline.hpp:111) V [jvm.dll+0xb69166] MarkSweep::follow_object+0x186 (markSweep.cpp:98) V [jvm.dll+0xb64e82] MarkSweep::follow_root<oop>+0x1d2 (markSweep.cpp:145) V [jvm.dll+0x7327f3] GenCollectedHeap::process_roots+0x1d3 (genCollectedHeap.cpp:770) V [jvm.dll+0x7311fb] GenCollectedHeap::full_process_roots+0x6b (genCollectedHeap.cpp:803) Windbg shows inlined functions. The GC is processing OopStorage -> processing a Mirror for klass ClassUnloadEventTest$TestClassLoader which makes sense since this class loader klass is loaded by the AppClassLoader - the odd thing is that the mirror of the class also points to an instance of the class, which shouldn't exist and should cause the class to be unloaded.
28-09-2022

What type of failure is SA giving you? I'm guessing it's having trouble finding referenced libs. I know on linux and macOS we use SA_ALTROOT to fix that, but that's not the case on Windows. Try enabling LIBSAPROC_DEBUG and see what output is produced.
22-09-2022

Thanks Chris - I had --exe also, and I added HeapDumper.dump() and a pass to look at thread metadata to see if the class is in a CompileTask or somewhere. From debugging the mdmp file in windbg, the class that should be unloaded isn't referenced from a compile task, which is unfortunate because that was a good theory.
22-09-2022

Another option is to use HeapDumper.dump() just before the ShouldNotReachHere(), and then use a 3rd party tool to look at the hprof file.
22-09-2022

You also need to specify --exe.
22-09-2022

It's on Windows. I thought I could do jhsdb clhsdb --core hs*mdmp but it doesn't like that.
22-09-2022

I'm not sure of your point about SA. Are you saying you got a core dump to use with SA, but SA heap searching didn't turn up any references?
22-09-2022

I put a ShouldNotReachHere and logging if this class loader is referenced during GC (it shouldn't be alive when GC is called). I don't think SA search raw heap works on windows. Stack: [0x000000934a000000,0x000000934a100000] Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [jvm.dll+0xc3ad31] os::win32::platform_print_native_stack+0xf1 (os_windows_x86.cpp:236) V [jvm.dll+0xe8db87] VMError::report+0x10e7 (vmError.cpp:838) V [jvm.dll+0xe8f71e] VMError::report_and_die+0x65e (vmError.cpp:1686) V [jvm.dll+0xe8fe94] VMError::report_and_die+0x64 (vmError.cpp:1467) V [jvm.dll+0x583027] report_vm_error+0xb7 (debug.cpp:284) V [jvm.dll+0x582ee5] report_should_not_reach_here+0x35 (debug.cpp:334) V [jvm.dll+0x6b50c6] G1FullGCMarker::mark_and_push<enum narrowOop>+0x106 (g1FullGCMarker.inline.hpp:90) V [jvm.dll+0x6b81cc] InstanceMirrorKlass::oop_oop_iterate<enum narrowOop,G1MarkAndPushClosure>+0x14c (instanceMirrorKlass.inline.hpp:51) V [jvm.dll+0x6b60b8] OopOopIterateDispatch<G1MarkAndPushClosure>::Table::oop_oop_iterate<InstanceMirrorKlass,enum narrowOop>+0x48 (iterator.inline.hpp:111) V [jvm.dll+0x6c83c6] G1FullGCMarker::follow_object+0x216 (g1FullGCMarker.inline.hpp:152) V [jvm.dll+0x6c8d60] G1FullGCMarker::publish_and_drain_oop_tasks+0x450 (g1FullGCMarker.inline.hpp:178) V [jvm.dll+0x6c7c69] G1FullGCMarker::follow_marking_stacks+0x39 (g1FullGCMarker.inline.hpp:198) V [jvm.dll+0x6c77ad] G1FullGCMarker::complete_marking+0x3d (g1FullGCMarker.cpp:59) V [jvm.dll+0x6c6949] G1FullGCMarkTask::work+0xc9 (g1FullGCMarkTask.cpp:62) V [jvm.dll+0xed0ac7] WorkerThread::run+0x97 (workerThread.cpp:164) V [jvm.dll+0xe079a7] Thread::call_run+0x257 (thread.cpp:229) V [jvm.dll+0xc39678] thread_native_entry+0xb8 (os_windows.cpp:547) C [ucrtbase.dll+0x1fb80] C [KERNEL32.DLL+0x84d4] C [ntdll.dll+0x51791]
22-09-2022

This was reported before JDK-8290025 was integrated, so it's not JDK-8290025. Testing with this change now. I can reproduce this with post sweeper removal code with mach5 options above.
07-09-2022

Correct. When it passes it looks like this: stdout: [0.038s][info][gc] Using G1 stdout: Running debuggee stdout: [0.505s][info][class,unload] unloading class SampleClass__9 0x0000000801003c30 stdout: [0.505s][info][class,unload] unloading class SampleClass__8 0x0000000801003a28 stdout: [0.505s][info][class,unload] unloading class SampleClass__7 0x0000000801003820 stdout: [0.505s][info][class,unload] unloading class SampleClass__6 0x0000000801003618 stdout: [0.505s][info][class,unload] unloading class SampleClass__5 0x0000000801003410 stdout: [0.505s][info][class,unload] unloading class SampleClass__Alt__4 0x0000000801003208 stdout: [0.505s][info][class,unload] unloading class SampleClass__Alt__3 0x0000000801003000 stdout: [0.505s][info][class,unload] unloading class SampleClass__Alt__2 0x0000000801002c10 stdout: [0.505s][info][class,unload] unloading class SampleClass__Alt__1 0x0000000801002a08 stdout: [0.505s][info][class,unload] unloading class SampleClass__Alt__0 0x0000000801002800 stdout: [0.505s][info][class,unload] unloading class java.lang.invoke.LambdaForm$MH/0x0000000801001c00 0x0000000801001c00 stdout: [0.505s][info][class,unload] unloading class java.lang.invoke.LambdaForm$MH/0x0000000801001800 0x0000000801001800 stdout: [0.505s][info][class,unload] unloading class java.lang.invoke.LambdaForm$MH/0x0000000801001000 0x0000000801001000 stdout: [0.721s][info][gc ] GC(0) Pause Full (WhiteBox Initiated Full GC) 25M->16M(160M) 234.640ms stdout: Exiting debuggee
24-08-2022

It's not unloading the Samples classes though. Something's keeping them alive.
24-08-2022

Here's the debuggee log from a failed run. Looks like the GC is happening. Running debugger EventSet: event set, policy:2, count:1 = {VMStartEvent in thread main} EventSet: event set, policy:0, count:1 = {VMDeathEvent} stdout: [0.021s][info][gc] Using G1 stdout: Running debuggee stdout: [2.435s][info][class,unload] unloading class java.lang.invoke.LambdaForm$MH/0x0000000801001c00 0x0000000801001c00 stdout: [2.435s][info][class,unload] unloading class java.lang.invoke.LambdaForm$MH/0x0000000801001800 0x0000000801001800 stdout: [2.435s][info][class,unload] unloading class java.lang.invoke.LambdaForm$MH/0x0000000801001000 0x0000000801001000 stdout: [2.620s][info][gc ] GC(0) Pause Full (WhiteBox Initiated Full GC) 3M->0M(16M) 193.745ms stdout: Exiting debuggee
24-08-2022

Ah, I was looking at JDK19 source. JDK20 calls wb.fullGC(). It also no longer forces the youngGen collection. public static void triggerUnloading() { WhiteBox wb = WhiteBox.getWhiteBox(); wb.fullGC(); // will do class unloading }
24-08-2022

I changed the code for triggerUnloading to do WhiteBox.fullGC. Is this G1? Ok, yes, you're calling the WhiteBox initiated full GC.
24-08-2022

Clarifying my above comment. The output is actually the following. I'm not even sure what triggers the WhiteBox GC logging at the end (I'll look into it), but there is an earlier GC log also that happens before the debuggee even enters its main method. stdout: [0.038s][info][gc] Using G1 stdout: Running debuggee stdout: [0.505s][info][class,unload] unloading class SampleClass__9 0x0000000801003c30 stdout: [0.505s][info][class,unload] unloading class SampleClass__8 0x0000000801003a28 stdout: [0.505s][info][class,unload] unloading class SampleClass__7 0x0000000801003820 stdout: [0.505s][info][class,unload] unloading class SampleClass__6 0x0000000801003618 stdout: [0.505s][info][class,unload] unloading class SampleClass__5 0x0000000801003410 stdout: [0.505s][info][class,unload] unloading class SampleClass__Alt__4 0x0000000801003208 stdout: [0.505s][info][class,unload] unloading class SampleClass__Alt__3 0x0000000801003000 stdout: [0.505s][info][class,unload] unloading class SampleClass__Alt__2 0x0000000801002c10 stdout: [0.505s][info][class,unload] unloading class SampleClass__Alt__1 0x0000000801002a08 stdout: [0.505s][info][class,unload] unloading class SampleClass__Alt__0 0x0000000801002800 stdout: [0.505s][info][class,unload] unloading class java.lang.invoke.LambdaForm$MH/0x0000000801001c00 0x0000000801001c00 stdout: [0.505s][info][class,unload] unloading class java.lang.invoke.LambdaForm$MH/0x0000000801001800 0x0000000801001800 stdout: [0.505s][info][class,unload] unloading class java.lang.invoke.LambdaForm$MH/0x0000000801001000 0x0000000801001000 stdout: [0.721s][info][gc ] GC(0) Pause Full (WhiteBox Initiated Full GC) 25M->16M(160M) 234.640ms stdout: Exiting debuggee
24-08-2022

Ok. For a passing run it does: stdout: [0.721s][info][gc ] GC(0) Pause Full (WhiteBox Initiated Full GC) 25M->16M(160M) 234.640ms I'll try to reproduce the failure with this logging included. BTW, ClassUnloadCommon.triggerUnloading() does call System.gc(): private static void allocateMemory(int kilobytes) { ArrayList<byte[]> l = new ArrayList<>(); dummy = l; for (int i = kilobytes; i > 0; i -= 1) { l.add(new byte[1024]); } l = null; dummy = null; } public static void triggerUnloading() { allocateMemory(16 * 1024); // force young collection System.gc(); }
24-08-2022

Can you add -Xlog:gc and see if GC even runs in this test?
24-08-2022

I've added a subtask enable -Xlog:class+unload=info for the debuggee and have the debugger print the debuggee output. I've done this work in my local repo and will get a PR out for it soon.
24-08-2022