Bug ID: JDK-8214315 G1: fatal error: acquiring lock SATB_Q_FL_lock/1 out of order with lock tty

JDK-8214315 : G1: fatal error: acquiring lock SATB_Q_FL_lock/1 out of order with lock tty_lock/0

Type: Bug
Component: hotspot
Sub-Component: gc
Affected Version: 12

Priority: P2
Status: Closed
Resolution: Fixed

Submitted: 2018-11-26
Updated: 2021-08-14
Resolved: 2018-12-08

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 11	JDK 12
11.0.6-oracleFixed	12 b24Fixed

Related Reports

Blocks :	JDK-8224958 - add os::dll_load calls to event log
Relates :	JDK-8272480 - Remove Mutex::access rank
Relates :	JDK-8230943 - False deadlock detection with -XX:+CIPrintCompileQueue after JDK-8163511
Relates :	JDK-8265298 - Hard VM crash when deadlock between "access" and higher ranked lock is detected
Relates :	JDK-8214997 - Crash holding 'access' lock can deadlock in JVMCI compiler thread

Description

I got this running last nights debug build on my WS, I built this myself, not a promo build.

# Run progress: 0.00% complete, ETA 00:03:40
# Fork: 1 of 1
# Preparing profilers: LinuxPerfAsmProfiler 
# Profilers consume stdout and stderr from target VM, use -v EXTRA to copy to console
# Warmup Iteration   1: <forked VM failed with exit code 134>
<stdout last='20 lines'>
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/mutex.cpp:1316
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/eric/tmp/jenkins/workspace/Build-jdk-hs-DEBUG-nightly-with-AOT-Dell/jdk/open/src/hotspot/share/runtime/mutex.cpp:1316), pid=17767, tid=17870
#  fatal error: acquiring lock SATB_Q_FL_lock/1 out of order with lock tty_lock/0 -- possible deadlock
#
# JRE version: Java(TM) SE Runtime Environment (12.0) (slowdebug build 12-internal+0-2018-11-26-0139016.eric...)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (slowdebug 12-internal+0-2018-11-26-0139016.eric..., mixed mode, aot, sharing, tiered, jvmci, jvmci compiler, compressed oops, g1 gc, linux-amd64)
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P" (or dumping to /ssd/home/eric/views/renaissance-suite/core.17767)
#
# An error report file with more information is saved as:
# /ssd/home/eric/views/renaissance-suite/hs_err_pid17767.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
Current thread is 17870
Dumping core ...
</stdout>
<stderr last='20 lines'>

Comments

An additional review for the jdk11 downport has been done by mdoerr on hotspot-dev .
29-08-2019
Fix Request I would like to have the patch as well in jdk11, because it turned out it is a prerequisite of my change JDK-8224958 . The patch applies with minor adjustments (will post for a review on the mailing list).
28-08-2019
URL: http://hg.openjdk.java.net/jdk/jdk/rev/bf2f2560dd53 User: kbarrett Date: 2018-12-08 23:54:49 +0000
08-12-2018
This particular failure appears to involve -XX:+LogCompilation and -XX:+UseJVMCICompiler. In nmethod::log_new_method() we have if (LogCompilation && xtty != NULL) { ttyLocker ttyl; HandleMark hm; xtty->begin_elem("nmethod"); log_identity(xtty); ... } } In nmethod::log_identity(), in the INCLUDE_JVMCI block, there is a call to jvmci_installed_code_name, which calls JNIHandles::resolve on the nmethod::_jvmci_installed_code jweak, triggering the read barrier while we're inside log_new_method's ttyLocker. From there we get various cascading internal errors. It looks like there are other variations of on this theme, such as nmethod::log_state_change.
07-12-2018
A different, and probably more robust and general solution, is the move the tty_lock rank up to above 'access' but below 'special'. However, that isn't a complete solution either. An assert while holding an 'access' lock may (with JVMCI) end up in the error handler calling JNIHandles::resolve as described above, which could end up with a recursive lock attempt. Also, there might be places in the error handler that use try to lock the tty_lock, which would be a different lock rank inversion.
05-12-2018
Maybe making jvmci_installed_code_name use only AS_NO_KEEPALIVE accesses would be sufficient. That might require several new APIs though; neither JNIHandles nor InstalledCode currently provide that capability.
05-12-2018
It looks like we're touching an oop (specifically, resolving a jweak) while holding the tty_lock. The failure occurred because that resolve found the current thread to be without a SATB buffer and so tried to allocate one. There are quite a few places where the tty_lock is held while doing some computation involving oops. That seems like something that can't work currently. Perhaps the "access" lock ranks need to be still lower, below the rank of the tty_lock (which has rank "event"). Or perhaps move tty_lock to a new rank between access and special, in order to leave other "event" ranked locks alone.
26-11-2018