JDK-8257521 : runtime/logging/MonitorInflationTest.java crashed in MonitorList::unlink_deflated
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 16
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: windows
  • CPU: x86_64
  • Submitted: 2020-12-01
  • Updated: 2021-01-20
  • Resolved: 2020-12-23
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 16 JDK 17
16 b30Fixed 17Fixed
Related Reports
Relates :  
Description
The following test failed in the JDK16 CI:

runtime/logging/MonitorInflationTest.java

Here's a snippet from the log file:

----------System.err:(55/3449)*----------
 stdout: [[0.003s][warning][arguments] -XX:+TraceMonitorInflation is deprecated. Will use -Xlog:monitorinflation=trace instead.
[0.333s][trace  ][monitorinflation] inflate(has_locker): object=0x000000047b8a0ea8, mark=0x0000023cff7e3532, type='MonitorInflationTest$Waiter'
I've been waiting.
[0.448s][trace  ][monitorinflation] Checking in_use_list:
[0.448s][trace  ][monitorinflation] count=1, max=1
[0.448s][trace  ][monitorinflation] in_use_count=1 equals ck_in_use_count=1
[0.448s][trace  ][monitorinflation] in_use_max=1 equals ck_in_use_max=1
[0.448s][trace  ][monitorinflation] No errors found in in_use_list checks.
[0.448s][trace  ][monitorinflation] In-use monitor info:
[0.448s][trace  ][monitorinflation] (B -> is_busy, H -> has hash code, L -> lock status)
[0.448s][trace  ][monitorinflation]            monitor  BHL              object         object type
[0.448s][trace  ][monitorinflation] ==================  ===  ==================  ==================
[0.448s][trace  ][monitorinflation] 0x0000023cff7e3530  000  0x000000047b8a0ea8  MonitorInflationTest$Waiter
[0.484s][debug  ][monitorinflation] begin deflating: in_use_list stats: ceiling=11264, count=1, max=1
[0.484s][trace  ][monitorinflation] deflate_monitor: object=0x000000047b8a0ea8, mark=0x0000023cff7e3532, type='MonitorInflationTest$Waiter'
[0.484s][debug  ][monitorinflation] deflated 1 monitors in 0.0000488 secs
[0.484s][debug  ][monitorinflation] end deflating: in_use_list stats: ceiling=11264, count=0, max=1
[0.484s][debug  ][monitorinflation] begin deflating: in_use_list stats: ceiling=11264, count=0, max=1
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffa7d72eb60, pid=259220, tid=366152
#
# JRE version: Java(TM) SE Runtime Environment (16.0+27) (fastdebug build 16-ea+27-1844)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 16-ea+27-1844, mixed mode, sharing, tiered, compressed oops, g1 gc, windows-amd64)
# Problematic frame:
# V  [jvm.dll+0xc4eb60]  MonitorList::unlink_deflated+0x50
#
# Core dump will be written. Default location: T:\\testoutput\\test-support\\jtreg_open_test_hotspot_jtreg_tier1_runtime\\scratch\\3\\hs_err_pid259220.mdmp
#
# An error report file with more information is saved as:
# T:\\testoutput\\test-support\\jtreg_open_test_hotspot_jtreg_tier1_runtime\\scratch\\3\\hs_err_pid259220.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#
];

Here's the crashing thread's stack:

---------------  T H R E A D  ---------------

Current thread (0x0000023cff7d5170):  VMThread "VM Thread" [stack: 0x000000a35c400000,0x000000a35c500000] [id=366152]

Stack: [0x000000a35c400000,0x000000a35c500000],  sp=0x000000a35c4ff330,  free space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [jvm.dll+0xc4eb60]  MonitorList::unlink_deflated+0x50  (synchronizer.cpp:116)
V  [jvm.dll+0xc4be81]  ObjectSynchronizer::deflate_idle_monitors+0x301  (synchronizer.cpp:1528)
V  [jvm.dll+0xc4c2ed]  ObjectSynchronizer::do_final_audit_and_print_stats+0xad  (synchronizer.cpp:1673)
V  [jvm.dll+0xd2256f]  VMThread::run+0x26f  (vmThread.cpp:204)
V  [jvm.dll+0xc99824]  Thread::call_run+0x1b4  (thread.cpp:398)
V  [jvm.dll+0xaeb76e]  thread_native_entry+0x10e  (os_windows.cpp:561)
C  [ucrtbase.dll+0x1fb80]
C  [KERNEL32.DLL+0x84d4]
C  [ntdll.dll+0x6e871]


siginfo: EXCEPTION_ACCESS_VIOLATION (0xc0000005), reading address 0x00000000000000c0

Comments
Changeset: 9cd8e38a Author: Daniel D. Daugherty <dcubed@openjdk.org> Date: 2020-12-23 14:36:47 +0000 URL: https://git.openjdk.java.net/jdk16/commit/9cd8e38a
23-12-2020

Here's the context diff for the likely fix: $ git diff diff --git a/src/hotspot/share/runtime/synchronizer.cpp b/src/hotspot/share/runtime/synchronizer.cpp index 1fd6930d96c..98afe838c49 100644 --- a/src/hotspot/share/runtime/synchronizer.cpp +++ b/src/hotspot/share/runtime/synchronizer.cpp @@ -112,7 +112,8 @@ size_t MonitorList::unlink_deflated(Thread* self, LogStream* ls, ObjectMonitor* prev = NULL; ObjectMonitor* head = Atomic::load_acquire(&_head); ObjectMonitor* m = head; - do { + // The in-use list head can be NULL during the final audit. + while (m != NULL) { if (m->is_being_async_deflated()) { // Find next live ObjectMonitor. ObjectMonitor* next = m; @@ -154,7 +155,7 @@ size_t MonitorList::unlink_deflated(Thread* self, LogStream* ls, "unlinked_count", unlinked_count, ls, timer_p); } - } while (m != NULL); + } Atomic::sub(&_count, unlinked_count); return unlinked_count; } And with that code in place: $ java -Xlog:monitorinflation=info Hello 2>&1 | tee Hello.log Hello world! [0.040s][info][monitorinflation] begin deflating: in_use_list stats: ceiling=11264, count=0, max=0 [0.040s][info][monitorinflation] end deflating: in_use_list stats: ceiling=11264, count=0, max=0 [0.040s][info][monitorinflation] Checking in_use_list: [0.040s][info][monitorinflation] count=0, max=0 [0.040s][info][monitorinflation] in_use_count=0 equals ck_in_use_count=0 [0.040s][info][monitorinflation] in_use_max=0 equals ck_in_use_max=0 [0.040s][info][monitorinflation] No errors found in in_use_list checks.
22-12-2020

Added the following debug code to delay the start of worker threads that call Object.wait(): $ git diff diff --git a/src/java.base/share/classes/java/lang/ref/Finalizer.java b/src/java.base/share/classes/java/lang/ref/Finalizer.java index f1405990461..5c915401a4f 100644 --- a/src/java.base/share/classes/java/lang/ref/Finalizer.java +++ b/src/java.base/share/classes/java/lang/ref/Finalizer.java @@ -153,6 +153,10 @@ final class Finalizer extends FinalReference<Object> { /* Package-private; must // in case of recursive call to run() if (running) return; +try { +Thread.sleep(1000); +} catch (InterruptedException ie) { +} // Finalizer thread starts before System.initializeSystemClass // is called. Wait until JavaLangAccess is available diff --git a/src/java.base/share/classes/java/lang/ref/Reference.java b/src/java.base/share/classes/java/lang/ref/Reference.java index cc889fd110b..180823c4bf1 100644 --- a/src/java.base/share/classes/java/lang/ref/Reference.java +++ b/src/java.base/share/classes/java/lang/ref/Reference.java @@ -236,6 +236,10 @@ public abstract class Reference<T> { private static boolean processPendingActive = false; private static void processPendingReferences() { +try { +Thread.sleep(1000); +} catch (InterruptedException ie) { +} // Only the singleton reference processing thread calls // waitForReferencePendingList() and getAndClearReferencePendingList(). // These are separate operations to avoid a race with other threads diff --git a/src/java.base/share/classes/jdk/internal/ref/CleanerImpl.java b/src/java.base/share/classes/jdk/internal/ref/CleanerImpl.java index 60924e19774..1ccfe996017 100644 --- a/src/java.base/share/classes/jdk/internal/ref/CleanerImpl.java +++ b/src/java.base/share/classes/jdk/internal/ref/CleanerImpl.java @@ -125,6 +125,10 @@ public final class CleanerImpl implements Runnable { */ @Override public void run() { +try { +Thread.sleep(1000); +} catch (InterruptedException ie) { +} Thread t = Thread.currentThread(); InnocuousThread mlThread = (t instanceof InnocuousThread) ? (InnocuousThread) t With this simple test program: $ cat Hello.java public class Hello { public static void main(String args[]) { System.out.println("Hello world!"); } } and running with monitorinflation logging: $ java -Xlog:monitorinflation=info Hello 2>&1 | tee Hello.log Hello world! [0.080s][info][monitorinflation] begin deflating: in_use_list stats: ceiling=11264, count=0, max=0 # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00000001051afce0, pid=92136, tid=15619 # # JRE version: Java(TM) SE Runtime Environment (17.0) (build 17-internal+0-LTS-2020-12-22-2228488.dcubed...) # Java VM: Java HotSpot(TM) 64-Bit Server VM (17-internal+0-LTS-2020-12-22-2228488.dcubed..., mixed mode, sharing, tiered, compressed oops, g1 gc, bsd-amd64) # Problematic frame: # V [libjvm.dylib+0x9afce0] _ZN11MonitorList15unlink_deflatedEP6ThreadP9LogStreamP12elapsedTimerP13GrowableArrayIP13ObjectMonitorE+0x90 # # Core dump will be written. Default location: /cores/core.92136 # # An error report file with more information is saved as: # /work/shared/bug_hunt/8257521_for_jdk17.git/hs_err_pid92136.log # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # We crash in the expected spot: --------------- T H R E A D --------------- Current thread (0x00007faa05c473e0): VMThread "VM Thread" [stack: 0x000070001010b000,0x000070001020b000] [id=15619] Stack: [0x000070001010b000,0x000070001020b000], sp=0x000070001020abe0, free space=1022k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.dylib+0x9afce0] _ZN11MonitorList15unlink_deflatedEP6ThreadP9LogStreamP12elapsedTimerP13GrowableArrayIP13ObjectMonitorE+0x90 V [libjvm.dylib+0x9b2c2b] _ZN18ObjectSynchronizer21deflate_idle_monitorsEv+0x40b V [libjvm.dylib+0x9b3055] _ZN18ObjectSynchronizer30do_final_audit_and_print_statsEv+0x25 V [libjvm.dylib+0xa7cffb] _ZN8VMThread3runEv+0x17b V [libjvm.dylib+0x9f98e1] _ZN6Thread8call_runEv+0x71 V [libjvm.dylib+0x889db7] _ZL19thread_native_entryP6Thread+0x157 C [libsystem_pthread.dylib+0x32eb] _pthread_body+0x7e C [libsystem_pthread.dylib+0x6249] _pthread_start+0x42 C [libsystem_pthread.dylib+0x240d] thread_start+0xd
22-12-2020

This bug was introduced by: JDK-8253064 monitor list simplifications and getting rid of TSM which was integrated in jdk-16-b24.
22-12-2020

The logging tells the story: [0.484s][debug ][monitorinflation] begin deflating: in_use_list stats: ceiling=11264, count=1, max=1 [0.484s][trace ][monitorinflation] deflate_monitor: object=0x000000047b8a0ea8, mark=0x0000023cff7e3532, type='MonitorInflationTest$Waiter' [0.484s][debug ][monitorinflation] deflated 1 monitors in 0.0000488 secs [0.484s][debug ][monitorinflation] end deflating: in_use_list stats: ceiling=11264, count=0, max=1 [0.484s][debug ][monitorinflation] begin deflating: in_use_list stats: ceiling=11264, count=0, max=1 The first four logging lines show that the MonitorDeflationThread deflated the last ObjectMonitor on the in-use list. The final logging line shows that the final audit kicks in, but the in-use list has count=0 elements. In the normal case that's not a problem in ObjectSynchronizer::deflate_idle_monitors(): if (deflated_count > 0 || is_final_audit()) { but that is_final_audit() check allows us to enter the block with an empty in-use list and when we call MonitorList::unlink_deflated() we run into: ObjectMonitor* head = Atomic::load_acquire(&_head); ObjectMonitor* m = head; do { if (m->is_being_async_deflated()) { only the in-use list is empty so _head is NULL which means that 'm' is NULL and m->is_being_async_deflated() dereferences a NULL ptr and boom... For this failure to happen, the test has to deflate 'MonitorInflationTest$Waiter' and start the VM exit/termination before any of the VMs background worker threads have reached their own Object.wait() calls. That's a very rare situation indeed.
22-12-2020

> Shouldn't be possible given the VMThread has already initiated the termination safepoint, > unless the MonitorDeflationThread can hit a safepoint check somewhere inside > deflate_idle_monitors that leaves the deflation state "corrupt" ? We most definitely have safepoint/handshake checks in the deflate_idle_monitors() code path. They usually look like this: if (self->is_Java_thread()) { // A JavaThread must check for a safepoint/handshake and honor it. chk_for_block_req(self->as_Java_thread(), "deflation", "deflated_count", deflated_count, ls, timer_p); } I'll start looking to see if one of these points is in a bad place.
22-12-2020

> I'm wondering if the VMThread and MonitorDeflationThread are racing here? Shouldn't be possible given the VMThread has already initiated the termination safepoint, unless the MonitorDeflationThread can hit a safepoint check somewhere inside deflate_idle_monitors that leaves the deflation state "corrupt" ?
02-12-2020

ILW = HLM = P3
01-12-2020

The VMThread is crashing during the final audit: V [jvm.dll+0xc4eb60] MonitorList::unlink_deflated+0x50 (synchronizer.cpp:116) V [jvm.dll+0xc4be81] ObjectSynchronizer::deflate_idle_monitors+0x301 (synchronizer.cpp:1528) V [jvm.dll+0xc4c2ed] ObjectSynchronizer::do_final_audit_and_print_stats+0xad (synchronizer.cpp:1673) V [jvm.dll+0xd2256f] VMThread::run+0x26f (vmThread.cpp:204) The logging from System.err shows: [0.484s][debug ][monitorinflation] begin deflating: in_use_list stats: ceiling=11264, count=1, max=1 [0.484s][trace ][monitorinflation] deflate_monitor: object=0x000000047b8a0ea8, mark=0x0000023cff7e3532, type='MonitorInflationTest$Waiter' [0.484s][debug ][monitorinflation] deflated 1 monitors in 0.0000488 secs [0.484s][debug ][monitorinflation] end deflating: in_use_list stats: ceiling=11264, count=0, max=1 [0.484s][debug ][monitorinflation] begin deflating: in_use_list stats: ceiling=11264, count=0, max=1 I'm wondering if the VMThread and MonitorDeflationThread are racing here?
01-12-2020