Bug ID: JDK-8322743 C2: prevent lock region elimination in OSR compilation

JDK-8322743 : C2: prevent lock region elimination in OSR compilation

Type: Bug
Component: hotspot
Sub-Component: compiler
Affected Version: 8,11,17,20,21,22,23

Priority: P3
Status: Resolved
Resolution: Fixed
OS: linux_ubuntu
CPU: x86_64

Submitted: 2023-11-14
Updated: 2024-03-11
Resolved: 2024-02-29

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 23
23 b13Fixed

Related Reports

Duplicate :	JDK-8323686 - assert failed: object must be null or locked, and unbiased
Duplicate :	JDK-8323507 - assert(current->held_monitor_count() == 0) failed: Should not be possible
Relates :	JDK-8320515 - assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object
Relates :	JDK-7129618 - assert(obj_node->eqv_uncast(obj),"");
Relates :	JDK-8320310 - CompiledMethod::has_monitors flag can be incorrect
Relates :	JDK-8286957 - Held monitor count
Relates :	JDK-8327743 - JVM crash in hotspot/share/runtime/javaThread.cpp - failed: held monitor count should be equal to jni: 0 != 1

Description

ADDITIONAL SYSTEM INFORMATION :
# JRE version: OpenJDK Runtime Environment (22.0) (fastdebug build 22-internal-adhoc.jdk22u)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-internal-adhoc.jdk22u, compiled mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)

A DESCRIPTION OF THE PROBLEM :
I ran a modified regression test case on JDK 22.0 and found that the JVM crashed with -Xcomp option.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/repository/jdk22u/src/hotspot/share/runtime/javaThread.cpp:880), pid=25410, tid=25515
#  assert(held_monitor_count() == jni_monitor_count()) failed: held monitor count should be equal to jni: 1 != 0
#
# JRE version: OpenJDK Runtime Environment (22.0) (fastdebug build 22-internal-adhoc.jdk22u)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-internal-adhoc.jdk22u, compiled mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x3325baf]  JavaThread::exit(bool, JavaThread::ExitType)+0xccf
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
java  -Xcomp -XX:CompileCommand="compileonly,compiler.c1.Test8267042::write" compiler.c1.Test8267042

ACTUAL -
CompileCommand: compileonly compiler/c1/Test8267042.write bool compileonly = true
Starting test
read() before wait
read() after wait
read() before wait
read() after wait
read() before wait
read() after wait
read() before wait
read() after wait
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/repository/jdk22u/src/hotspot/share/runtime/javaThread.cpp:880), pid=25410, tid=25515
#  assert(held_monitor_count() == jni_monitor_count()) failed: held monitor count should be equal to jni: 1 != 0
#
# JRE version: OpenJDK Runtime Environment (22.0) (fastdebug build 22-internal-adhoc.jdk22u)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-internal-adhoc.jdk22u, compiled mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x3325baf]Test passed
  JavaThread::exit(bool, JavaThread::ExitType)+0xccf
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/repository/toreport/Test8267042_11_13_22_51_01/hs_err_pid25410.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

---------- BEGIN SOURCE ----------
The source code file and hs_err file are available at the URL:
https://drive.google.com/file/d/1Ie3NupEt669LfE2DjCi2boFCRq3iXM3v/view?usp=drive_link

The modifications I made are in the source code at line 101 through line 113
---------- END SOURCE ----------

FREQUENCY : always

Comments

Changeset: 742c776a Author: Vladimir Kozlov <kvn@openjdk.org> Date: 2024-02-29 20:20:46 +0000 URL: https://git.openjdk.org/jdk/commit/742c776a922bc226a3beaa9e219ff0bd2baf7bc4
29-02-2024
A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/17331 Date: 2024-01-09 22:57:27 +0000
09-01-2024
I was not able to reproduce the issue with Test8267042 but I see now that submitter modified the test to use local object to synchronize: synchronized void write(char[] data, int offset, int length) throws IOException { while (--length >= 0) { // modified code synchronized (new Test8267042()) { int var1 = 0; while (var1 < 100000) { if (var1 == 50000) { getZeroOnStack(offset); } var1++; } } // original code // getZeroOnStack(offset); write(data[offset++]); } }
09-01-2024
This problem shown by reproducer test is strange for me. You should not use a local not escaped object for synchronization since other threads can't access it and synchronization does not work. I am not sure how [~thartmann] converted `compiler.c1.Test8267042.java` test into reproducer. Test8267042 has synchronization on methods. If you have normal synchronization on object, that object should be accessed by other threads and as such escaped. Escape Analysis then will keep all locks. Your example with Object created outside synchronization block indeed will have incorrect locking state after deoptimization. But this local object is short lived. About deoptimization. From generated code I see 2 safepoints (deoptimization points): one on back edge of internal loop (what is left from it) and an other from external loop. The problem is only internal deoptimization point where the object is still alive. You have very small window to hit it. But again, since object is local and short lived it should not affect execution (other threads) except some accounting for locks.
09-01-2024
I'm trying to figure what symptoms we may have, as we have a couple of bugs with IMSE which we don't understand. If we may throw IMSE when removing the activation it's good to know.
09-01-2024
> 2: We transition to OSR compiled code, where that lock is 'eliminated' . *This creates the assert as we never unlock this object. As I said, this is the bug and I am working on fix. Please, wait the fix.
08-01-2024
[~kvn] 1: We lock the Object in interpreter. 2: We transition to OSR compiled code, where that lock is 'eliminated' . *This creates the assert as we never unlock this object. 3: We deopt on a bytecode after the synchronized block. 4: We are now in interpreter and the Object must be unlock, but as Tobias says: "The compiled code then never unlocks it."
08-01-2024
[~rehn] deopting outside synchronized scope will not relock object since there will be no reference for monitor in JVM state and in safepoint data where we deopt. We are fine in such case.
05-01-2024
Okay, I didn't hit that assert when running with -XX:LockingMode=0 but maybe the test needs to be tweaked.
05-01-2024
[~thartmann] owner == nullptr is unlocked, so that is what we assert. Hence running your test with -XX:LockingMode=0 could hit this assert. [~kvn] I mean in the case that object should not be relocked as we deopted outside of the synchronized scope. for (int i = 0; i < 2; ++i) { Object o = new Object(); synchronized (o) { // monitorenter // Trigger OSR compilation for (int j = 0; j < 100_000; ++j) { } } // monitorexit // deopt happens after monitorexit, o must be unlocked }
05-01-2024
[~rehn], thanks for the additional details, that makes sense to me. > Looking more closely it seem that we only asserts the owner is NULL. > With the call to "set_owner_from(nullptr, DEFLATER_MARKER);". So should we have an additional assert there to check that monitors are unlocked when being reclaimed? > But if the non escaping Object is allocate before the synchronized scope and we deopt after the synchronized scope? Would it give back a lock object ? I believe that's not different from the non-OSR case when C2 eliminates locks and should be taken care of by restore_eliminated_locks, as Vladimir mentioned. It's probably worth double-checking the different cases though. [~kvn] Thanks for jumping on this! I wanted to bring it up for discussion in next week's compiler staff but I think you summarized the options that we have.
05-01-2024
[~rehn] We relock objects during deoptimization if locks were eliminated in compiled code: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/deoptimization.cpp#L456
05-01-2024
The issue here is that Unlock node references (through Phi (1) node) two objects, one from Interpreter and new one allocated. But Lock node only references new allocated object. So C2 decide to eliminate Lock since locally allocated object does not escape (it is not eliminated because it is referenced by Phi (1) which is input to Unlock node). When we eliminate Lock we eliminate also all Lock and Unlock which point to the same BoxLock node. Which is not correct - we can't eliminate locks/unlocks which may reference other objects. The issue was introduced by JDK-7129618 in JDK8 which relaxed the check in `mark_eliminated_box()`: https://github.com/openjdk/jdk/commit/b9cba282eeeac2d32f0f17bfb64189fb5b46754c#diff-9109dab2601aefca77e5f15150bbef0d83935b71853595ed829a1945f80f6668L1808 So we either need to clone block with Unlock in ciTypeFlow so we would have separate Unlock nodes for such cases referencing different objects. Or don't eliminate locks/unlocks for such cases which needs to be done carefully to avoid regression. May be only for the case when an other object comes from Interpreter in OSR compilation.
04-01-2024
>I don't think that can happen. OSR will not throw away the locked object (see below) but only omit the locking/unlocking and jstack will trigger deoptimization which will re-create the interpreter state. Maybe jstack is bad example, there several methods that walk-stack and gather locks held. I'd would say something like, if "javaVFrame::locked_monitors()" ever can return the lock it have escaped. In this case the "interpretedVFrame::monitors" could have returned the object at one point, thus it have escaped. >I thought that once the locked object (which is dead) is reclaimed by the GC, the monitor is freed as well but I'm not familiar enough with the implementation details here. Just trying to get a better understanding of the implications. Yes, but that algorithm assumes those are unlocked. Looking more closely it seem that we only asserts the owner is NULL. With the call to "set_owner_from(nullptr, DEFLATER_MARKER);". >The object is not eliminated, only the locking/unlocking. So there is no need for re-allocation. The C2 compiled code keeps track of the oop of the locked object and will hand it back to the interpreter on deopt if it's still live. Ok, good. But if the non escaping Object is allocate before the synchronized scope and we deopt after the synchronized scope? Would it give back a lock object ?
04-01-2024
Based on Tobias example, there are 2 objects in graph during OSR compilation of this method. One is coming from Interpreter (which means it escaped) and an other new allocated object when C2 generated graph for external loop (which does not escape). I admit that that EA may miss check somewhere for objects coming from Interpreter for OSR compilation and it should be fixed.
04-01-2024
> A i.e. jstack will publish this object as locked. OSR will throw away the locked object. A second jstack, even if happens on the exactly same bytecode without any execution, but after OSR, this object will no longer be visible in jstack. I don't think that can happen. OSR will not throw away the locked object (see below) but only omit the locking/unlocking and jstack will trigger deoptimization which will re-create the interpreter state. > As we have recorded the locking in a monitor, this monitor will never be unlocked thus never freed, so I think that would be leaking object monitors (as no one would clear _owner). OM's are query-able, so this would be seen in a debugger. I thought that once the locked object (which is dead) is reclaimed by the GC, the monitor is freed as well but I'm not familiar enough with the implementation details here. Just trying to get a better understanding of the implications. > When we recreate interpreted frames we allocate objects for those eliminated, Deoptimization::realloc_objects(). > Since this during OSR maybe there are some steps I'm unaware of or so. > As far as I understand it, we would create X then synchronize-enter X, throw away X, create Z synchronize-enter Z. The object is not eliminated, only the locking/unlocking. So there is no need for re-allocation. The C2 compiled code keeps track of the oop of the locked object and will hand it back to the interpreter on deopt if it's still live.
04-01-2024
>> Otherwise debuggers and similar would not be so happy. As the interpreted stack represent the Java frame state, recording the lock there should be viewed as publishing it IMHO. >But after the OSR transition, we don't have an interpreted stack frame anymore and debugging would then trigger deoptimization (and therefore re-creation of the correct interpreter stack frame), right? A i.e. jstack will publish this object as locked. OSR will throw away the locked object. A second jstack, even if happens on the exactly same bytecode without any execution, but after OSR, this object will no longer be visible in jstack. >> One other issue seems to be when running with heavy weight monitors, that synchronized would inflate, now it's very global visible !? >Why would using heavy weight monitors make a difference here? As we have recorded the locking in a monitor, this monitor will never be unlocked thus never freed, so I think that would be leaking object monitors (as no one would clear _owner). OM's are query-able, so this would be seen in a debugger. >> Secondly, what happens if we deoptimize ? Would we create the object again and lock it again ? That seems bad >If we deopt, we would hand over the already locked object to the interpreter (no need to re-allocate) and the interpreter would then unlock the object once we leave the scope of the synchronized block. When we recreate interpreted frames we allocate objects for those eliminated, Deoptimization::realloc_objects(). Since this during OSR maybe there are some steps I'm unaware of or so. As far as I understand it, we would create X then synchronize-enter X, throw away X, create Z synchronize-enter Z.
04-01-2024
I agree with Tom. Anything done in interpreter should be treated as such. Otherwise debuggers and similar would not be so happy. As the interpreted stack represent the Java frame state, recording the lock there should be viewed as publishing it IMHO. One other issue seems to be when running with heavy weight monitors, that synchronized would inflate, now it's very global visible !? Secondly, what happens if we deoptimize ? Would we create the object again and lock it again ? That seems bad. Hence I agree with Tom.
04-01-2024
I really didn't look at this in detail yet but my understanding is that since EA proved that the object is non-escaping, we ignore the object that we get in through the OSR state since it must be non-escaping as well. Yes, it seems like a special case for performance reasons.
04-01-2024
I agree that this should be fixed, I was just defending the optimization from C2's perspective :) > Anything done in interpreter should be treated as such. Sure but the difference here is that C2 "sees" and compiles the entire scope in which the objects are live and determines that they can't escape. The (incorrect) assumption is then that the object we get from the OSR state can't be globally visible. > Otherwise debuggers and similar would not be so happy. As the interpreted stack represent the Java frame state, recording the lock there should be viewed as publishing it IMHO. But after the OSR transition, we don't have an interpreted stack frame anymore and debugging would then trigger deoptimization (and therefore re-creation of the correct interpreter stack frame), right? > One other issue seems to be when running with heavy weight monitors, that synchronized would inflate, now it's very global visible !? Why would using heavy weight monitors make a difference here? > Secondly, what happens if we deoptimize ? Would we create the object again and lock it again ? That seems bad If we deopt, we would hand over the already locked object to the interpreter (no need to re-allocate) and the interpreter would then unlock the object once we leave the scope of the synchronized block.
04-01-2024
Isn't that a bug with EA and OSR though? Any lock that comes from the OSR state seems like it has already escaped. Certainly any object from the OSR state is potentially globally visible though maybe the algorithm wants to catch this case for performance reasons?
03-01-2024
Great find!
03-01-2024
I had a quick look at this and the problem seems to be that when OSR compiling the following loop, C2 completely removes the lock/unlock because 'new Object()' is non-escaping: for (int i = 0; i < 2; ++i) { synchronized (new Object()) { // Trigger OSR compilation for (int j = 0; j < 100_000; ++j) { } } } But since it's an OSR compilation we transition from interpreted to compiled on the inner loop back-branch and therefore the Object we get in from the interpreter is already locked. The compiled code then never unlocks it. We fetch the locked object from the interpreter state here: https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/parse1.cpp#L232 And Escape Analysis ignores it here: https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/escape.cpp#L1306 And then marks the lock for removal: https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/escape.cpp#L2558 This is an old issue existing since "initial load". Similar to Graal, as [~never] explained in JDK-8320142, C2's Escape Analysis relies on the assumption that leaving a non-escaping object for reclamation in locked state is fine. This is no longer true after JDK-8286957.
03-01-2024
I attached a simple Test.java that triggers the following two asserts (with latest JDK 23): java -Xcomp -XX:CompileCommand=compileonly,Test:: -XX:CompileCommand=quiet -XX:+PrintCompilation -XX:-TieredCompilation Test.java # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (workspace/open/src/hotspot/share/runtime/javaThread.cpp:879), pid=3521250, tid=3521267 # assert(held_monitor_count() == jni_monitor_count()) failed: held monitor count should be equal to jni: 1 != 0 # # JRE version: Java(TM) SE Runtime Environment (23.0+4) (fastdebug build 23-ea+4-173) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 23-ea+4-173, compiled mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0xec0cc0] JavaThread::exit(bool, JavaThread::ExitType)+0x870 Stack: [0x00007f86468ff000,0x00007f8646a00000], sp=0x00007f86469fecd0, free space=1023k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xec0cc0] JavaThread::exit(bool, JavaThread::ExitType)+0x870 (javaThread.cpp:879) V [libjvm.so+0xec1035] JavaThread::post_run()+0x15 V [libjvm.so+0x179ea85] Thread::call_run()+0xd5 V [libjvm.so+0x14a92e7] thread_native_entry(Thread)+0x127 # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (workspace/open/src/hotspot/share/runtime/synchronizer.cpp:1799), pid=3521217, tid=3521218 # assert(current->held_monitor_count() == 0) failed: Should not be possible # # JRE version: Java(TM) SE Runtime Environment (23.0+4) (fastdebug build 23-ea+4-173) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 23-ea+4-173, compiled mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x1737aa9] ObjectSynchronizer::release_monitors_owned_by_thread(JavaThread)+0x199 Stack: [0x00007f4e7b439000,0x00007f4e7b53a000], sp=0x00007f4e7b538c50, free space=1023k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x1737aa9] ObjectSynchronizer::release_monitors_owned_by_thread(JavaThread*)+0x199 (synchronizer.cpp:1799) V [libjvm.so+0xec0c4c] JavaThread::exit(bool, JavaThread::ExitType)+0x7fc V [libjvm.so+0xf9416a] jni_DetachCurrentThread+0xda C [libjli.so+0x4a1f] JavaMain+0xdef C [libjli.so+0x7ca9] ThreadJavaMain+0x9 Disabling C2's Escape Analysis via -XX:-DoEscapeAnalysis or lock elimination via -XX:-EliminateLocks serves as a workaround.
03-01-2024
I can reproduce this since the assert was introduced by JDK-8286957 in JDK 20 b06 and up until latest JDK 23. ILW = Assert due to unexpected monitor count, reproducible with modified regression test and -Xcomp + compileonly, no known workaround but disable compilation of affected method = HLM = P3
03-01-2024
This may already be fixed - see JDK-8320142 and JDK-8320515.
01-01-2024