JDK-8273107 : RunThese24H times out with "java.lang.management.ThreadInfo.getLockName()" is null
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 18
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2021-08-30
  • Updated: 2024-05-28
  • Resolved: 2021-12-17
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 18 JDK 19
18 b29Fixed 19Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
These errors have been seen in many RunThese24H runs, but in the past we then saw the access violation in get_thread_status() that was fixed by JDK-8269934, and the NPE was ignored:

[stress.process.err] java.lang.NullPointerException: Cannot invoke "String.startsWith(String)" because the return value of "java.lang.management.ThreadInfo.getLockName()" is null
[stress.process.err] 	at javasoft.sqe.tests.api.java.util.concurrent.JSR166TestCase.dumpTestThreads(JSR166TestCase.java:659)
[stress.process.err] 	at javasoft.sqe.tests.api.java.util.concurrent.JSR166TestCase.tearDownFail(JSR166TestCase.java:331)
[stress.process.err] 	at javasoft.sqe.tests.api.java.util.concurrent.JSR166TestCase.checkForkJoinPoolThreadLeaks(JSR166TestCase.java:387)
[stress.process.err] 	at javasoft.sqe.tests.api.java.util.concurrent.JSR166TestCase.tearDown(JSR166TestCase.java:364)
[stress.process.err] 	at javasoft.sqe.tests.api.junit.TestCase.invokeTestCase(TestCase.java:53)
[stress.process.err] 	at javasoft.sqe.javatest.lib.MultiTest.run(MultiTest.java:193)
[stress.process.err] 	at javasoft.sqe.javatest.lib.MultiTest.run(MultiTest.java:125)
[stress.process.err] 	at javasoft.sqe.tests.api.java.util.concurrent.LinkedBlockingDequeTest.main(LinkedBlockingDequeTest.java:51)
[stress.process.err] 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[stress.process.err] 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
[stress.process.err] 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[stress.process.err] 	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
[stress.process.err] 	at applications.kitchensink.process.stress.modules.JckStressModule$TestRunner$1.run(JckStressModule.java:275)

There are many such NPE's seen in the overall test times out. We also see many:

[stress.process.err] javasoft.sqe.tests.api.java.util.concurrent.CompletableFutureTest@410cadb8: ForkJoin common pool thread stuck
Comments
Note for posterity: This introduces monitor deflation in-safepoint, which can be problematic if it interweaves with concurrent deflation in the wrong way. JDK-8318757 removed this possibility later.
14-03-2024

This fix is integrated in jdk-18+29-1991.
18-12-2021

Changeset: a5f5d60f Author: Daniel D. Daugherty <dcubed@openjdk.org> Date: 2021-12-17 21:24:51 +0000 URL: https://git.openjdk.java.net/jdk18/commit/a5f5d60f2529bb28eee6d309bfbde904e188e540
17-12-2021

Tested the new patch for memory leaks in the new VM_ThreadDump::doit() logic by hacking one of my existing stress tests: $ git diff test/hotspot/jtreg/serviceability/monitoring/ThreadInfo/GetLockOwnerName/GetLockOwnerName.java diff --git a/test/hotspot/jtreg/serviceability/monitoring/ThreadInfo/GetLockOwnerName/GetLockOwnerName.java b/test/hotspot/jtreg/serviceability/monitoring/ThreadInfo/GetLockOwnerName/GetLockOwnerName.java index 53ac3819d70..1e5fd4f3d4f 100644 --- a/test/hotspot/jtreg/serviceability/monitoring/ThreadInfo/GetLockOwnerName/GetLockOwnerName.java +++ b/test/hotspot/jtreg/serviceability/monitoring/ThreadInfo/GetLockOwnerName/GetLockOwnerName.java @@ -29,7 +29,8 @@ * to getThreadInfo() and ThreadInfo.getLockOwnerName() until the thread * has exited. * @requires vm.jvmti - * @run main/othervm/native -agentlib:GetLockOwnerName GetLockOwnerName + * @run main/othervm/native/timeout=900 -agentlib:GetLockOwnerName + * GetLockOwnerName 600 */ import java.io.PrintStream; @@ -247,6 +248,7 @@ public class GetLockOwnerName { id = contender.getId(); info = mbean.getThreadInfo(id, 0); String name = info.getLockOwnerName(); +ThreadInfo junk[] = mbean.dumpAllThreads(/* lockedMonitors */ true, /* lockedSynchronizers */false); if (name == null) { out.println("Failure at " + count + " loops."); Modified the test to: - run for 10 minutes - call ThreadMXBean.dumpAllThreads() to force execution of VM_ThreadDump::doit() Ran the hacked test against the baseline and then ran the test against the new patch. The memory usage profile for both test runs was the same. I was originally planning to stress test memory usage by using: test/hotspot/jtreg/serviceability/monitoring/ThreadInfo/GetLockOwnerName/GetLockOwnerName.java with a modified "-iterations=50000" parameter, but that test leaks memory without any of my changes. If I remember correctly it leaked about 1.3GB in about 9 minutes of runtime...
14-12-2021

[~coleenp] - That's where I'm headed if the stats that I gather indicate that's where the problem lies... I'm also going to look at adding a hash table since we have to search the list of in-use monitors for each requested thread... I'll likely be asking you questions about the latest hash table thoughts...
04-12-2021

diff --git a/src/hotspot/share/runtime/vmOperations.cpp b/src/hotspot/share/runtime/vmOperations.cpp index a6d42120582..8335918ea7e 100644 --- a/src/hotspot/share/runtime/vmOperations.cpp +++ b/src/hotspot/share/runtime/vmOperations.cpp @@ -279,6 +279,10 @@ void VM_ThreadDump::doit() { concurrent_locks.dump_at_safepoint(); } + if (_with_locked_monitors) { + (void)ObjectSynchronizer::deflate_idle_monitors(); + } + if (_num_threads == 0) { // Snapshot all live threads I wonder if deflating idle monitors in the heap dump safepoint might help.
04-12-2021

Starting with adding some additional tracking of stats for in-use monitors.
03-12-2021

ILW = MMM = P3
23-11-2021

That would explain a lot [~stefank]. I looked at the code and the line numbers didn't match but the code did check for null and not throw NPE.
01-11-2021

The core files that I've looked at have this fix. I think the basic problem is that there are lot of threads, and for each thread, the code is looking through the monitor _in_use_list looking for monitor->owner == thread.
23-09-2021

[~coleenp] - Please check in with [~lmesnik] about his recent work on: JDK-8265489 Stress test times out because of long ObjectSynchronizer::monitors_iterate(...) operation It could be that while his fix has made things better, there is still an issue where we're spending way too much time in ObjectSynchronizer::monitors_iterate().
22-09-2021

This jdk-18+16-854-tier8 CI job failure is also stuck in monitors_iterate. Thread 1368 (Thread 0x7f9494fff700 (LWP 11865)): #0 0x00007f94ecc9ecff in ObjectSynchronizer::monitors_iterate(MonitorClosure*, JavaThread*) () from jdk-18/lib/server/libjvm.so #1 0x00007f94eccfdd2c in ThreadStackTrace::dump_stack_at_safepoint(int) () from jdk-18/lib/server/libjvm.so #2 0x00007f94ecd6fc25 in VM_ThreadDump::doit() () from jdk-18/lib/server/libjvm.so #3 0x00007f94ecd6fe7a in VM_Operation::evaluate() () from jdk-18/lib/server/libjvm.so #4 0x00007f94ecd71818 in VMThread::evaluate_operation(VM_Operation*) () from jdk-18/lib/server/libjvm.so
22-09-2021

Every other one of these core files won't load in gdb.
22-09-2021

I am still trying to figure out why getLockName() returns NULL in the test, even though the test has cases where it checks for NULL and doesn't call startsWith() on the name. The last instance of this crash is stopped in ObjectSynchronizer::monitors_iterate() There are ~1300 threads and for each thread, it's going through (gdb) print _in_use_list $4 = {_head = 0x7fdea0ac89d0, _count = 853608, _max = 853608} 977 while (iter.has_next()) { 978 ObjectMonitor* mid = iter.next(); 979 if (mid->owner() != thread) { 980 continue; 981 } All the other threads are stopped at a safepoint (during allocation) and there are 6 threads waiting telling the VMThread to do #18 0x00007fdf6ad04db5 in jmm_DumpThreads (env=<optimized out>, thread_ids=<optimized out>, locked_monitors=1 '\001', locked_synchronizers=1 '\001', maxDepth=<optimized out>) at open/src/hotspot/share/services/management.cpp:1273 Not sure why the monitor list is so long. (?) (gdb) print ObjectSynchronizer::_is_async_deflation_requested $9 = false (gdb) print _no_progress_cnt $10 = 0 (gdb) print AsyncDeflationInterval $11 = 250 (gdb) print MonitorUsedDeflationThreshold $12 = 90 (gdb) print _in_use_list_ceiling $13 = 1352704 (gdb)
17-09-2021

It's weird how the line numbers don't line up in the exception message: [stress.process.err] Exception in thread "pool-4019-thread-1" java.lang.NullPointerException: Cannot invoke "String.startsWith(String)" because the return value of "java.lang.management.ThreadInfo.getLockName()" is null [stress.process.err] at javasoft.sqe.tests.api.java.util.concurrent.JSR166TestCase.dumpTestThreads(JSR166TestCase.java:659) [stress.process.err] at javasoft.sqe.tests.api.java.util.concurrent.JSR166TestCase.threadRecordFailure(JSR166TestCase.java:320) [stress.process.err] at javasoft.sqe.tests.api.java.util.concurrent.JSR166TestCase.threadUnexpectedException(JSR166TestCase.java:510) [stress.process.err] at javasoft.sqe.tests.api.java.util.concurrent.JSR166TestCase$CheckedRunnable.run(JSR166TestCase.java:1068) [stress.process.err] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [stress.process.err] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [stress.process.err] at java.base/java.lang.Thread.run(Thread.java:833) getLockName is called in dumpTestThreads() but has null checks around it: ... if ((lockName = info.getLockName()) != null && lockName.startsWith("java.lang.ref")) return false; } if ((name.equals("Finalizer") || name.equals("Common-Cleaner")) && (lockName = info.getLockName()) != null && lockName.startsWith("java.lang.ref")) return false; if (name.startsWith("ForkJoinPool.commonPool-worker") && (lockName = info.getLockName()) != null && lockName.startsWith("java.util.concurrent.ForkJoinPool")) return false; lockName can be null if the thread that we're getting ThreadInfo isn't waiting on a lock (blocker_object) when the JVM constructs it. So I can't explain this NPE. I don't think it's why the test is hanging though.
17-09-2021

This last instance with ZGC looks a lot like the string dedup table bug that I looked at this morning JDK-8273695 I didn't see getLockName() in the log output.
15-09-2021