JDK-8316746 : Top of lock-stack does not match the unlocked object
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 21,22,repo-lilliput-17
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • CPU: ppc
  • Submitted: 2023-09-22
  • Updated: 2024-01-03
  • Resolved: 2023-11-09
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 17 JDK 21 JDK 22
11.0.22Fixed 17.0.10Fixed 21.0.2Fixed 22 b24Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
One additional object sometimes appears on the lock stack while running test "vmTestbase/nsk/jdi/StepEvent/_itself_/stepEvent004/stepEvent004.java":

LockStack[1]: nsk.share.jdi.EventHandler 
{0x00000000bcc102b8} - klass: 'nsk/share/jdi/EventHandler'
 - ---- fields (total size 5 words):
 - private volatile 'wasInterrupted' 'Z' @12  false (0x00)
 - private 'debuggee' 'Lnsk/share/jdi/Debugee;' @16  a 'nsk/share/jdi/LocalLaunchedDebugee'{0x00000000bcb697f8} (0xbcb697f8)
 - private 'log' 'Lnsk/share/Log;' @20  a 'nsk/share/Log'{0x00000000bcc7e538} (0xbcc7e538)
 - private 'vm' 'Lcom/sun/jdi/VirtualMachine;' @24  a 'com/sun/tools/jdi/VirtualMachineImpl'{0x00000000bccc2d78} (0xbccc2d78)
 - private 'requestManager' 'Lcom/sun/jdi/request/EventRequestManager;' @28  a 'com/sun/tools/jdi/EventRequestManagerImpl'{0x00000000bccc43c8} (0xbccc43c8)
 - private 'listenThread' 'Ljava/lang/Thread;' @32  a 'java/lang/Thread'{0x00000000bcc101a0} (0xbcc101a0)
LockStack[0]: java.util.Collections$SynchronizedRandomAccessList 
{0x00000000bca24b60} - klass: 'java/util/Collections$SynchronizedRandomAccessList'
 - ---- fields (total size 3 words):
 - final 'c' 'Ljava/util/Collection;' @12  a 'java/util/Vector'{0x00000000bca24b78} (0xbca24b78)
 - final 'mutex' 'Ljava/lang/Object;' @16  a 'java/util/Collections$SynchronizedRandomAccessList'{0x00000000bca24b60} (0xbca24b60)
 - final 'list' 'Ljava/util/List;' @20  a 'java/util/Vector'{0x00000000bca24b78} (0xbca24b78)

The obj at LockStack[0] is the one which is expected by the unlocking code in the C2 method:
J 1490% c2 nsk.share.jdi.EventHandler.run()V (475 bytes) @ 0x00007fff7c67139c [0x00007fff7c670f80+0x000000000000041c]
j  java.lang.Thread.runWith(Ljava/lang/Object;Ljava/lang/Runnable;)V+5 java.base@22-internal
j  java.lang.Thread.run()V+19 java.base@22-internal
v  ~StubRoutines::call_stub 0x00007fff7bdc084c

Observed on linuxppc64le so far (intermittently). Other platforms may be affected, but were not observed, yet. Seems like there are at least similar or related problems (linked to JDK-8315880).

The platform code for all platforms only supports removing the top of the lock stack. In contrast, ObjectSynchronizer::exit calls LockStack::remove which is able to remove any object.

The issue does no longer show up when using the slow path instead of stop("Top of lock-stack does not match the unlocked object").
Comments
[jdk11u-fix-request] Approval Request from Martin C2 OSR compiled methods are wrong on PPC64 in very rare cases without this fix (wrong unlock order). This gets currently only detected by the new lightweight locking, but the bug exists independently. The fix has been tested in jdk head, jdk 21u and the backport in jdk 17u-dev and jdk 11u-dev in our nightly tests. The version for 11 is almost identical to the version for 17 and has been reviewed.
23-11-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk11u-dev/pull/2288 Date: 2023-11-21 17:28:48 +0000
21-11-2023

[jdk17u-fix-request] Approval Request from Martin C2 OSR compiled methods are wrong on PPC64 in very rare cases without this fix (wrong unlock order). This gets currently only detected by the new lightweight locking, but the bug exists independently. The fix has been tested in jdk head, jdk 21u and this backport in jdk 17u-dev in our nightly tests.
21-11-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk17u-dev/pull/1977 Date: 2023-11-20 16:51:57 +0000
20-11-2023

[jdk21u-fix-request] Approval Request from Martin C2 OSR compiled methods are wrong on PPC64 in very rare cases without this fix (wrong unlock order). The fix applies cleanly and has been tested in jdk head and this backport in jdk 21u in our nightly tests. It only changes PPC64 code, removes wrong comments from some other platforms and adds a test which passes on all platforms.
15-11-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk21u/pull/361 Date: 2023-11-14 14:46:40 +0000
14-11-2023

Changeset: 7d8adfa8 Author: Martin Doerr <mdoerr@openjdk.org> Date: 2023-11-09 10:14:03 +0000 URL: https://git.openjdk.org/jdk/commit/7d8adfa855e51a90c2f125fc20a06f9a488e6248
09-11-2023

[~mdoerr] - The description note for this bug still says this: > Observed on linuxppc64le so far (intermittently). Other platforms may be affected, but were not > observed, yet. Seems like there are at least similar or related problems (linked to JDK-8315880). You may want to add an "Update: XXX" note to the description to make it clear that this has turned out to be a PPC64 only bug... Also, thanks for clearing out the errant comments on all the other platforms.
08-11-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/16406 Date: 2023-10-27 19:26:54 +0000
27-10-2023

I've updated the reproducer. Fails only on PPC64. I'll look into the interpreter.
26-10-2023

This PR shows a possible solution, but is not yet complete. I the long term, C2 should be able to run without usage of BoxLock nodes (except for LM_LEGACY).
25-10-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/16345 Date: 2023-10-24 12:19:04 +0000
24-10-2023

I have a synthetic reproducer: - Apply JDK-8316746-reproducer.patch (attached) - make run-test TEST="vmTestbase/nsk/jdi/StepEvent" JTREG="VM_OPTIONS=-Xbatch" Note: Only shows that the lock order is not as C2 needs it. I still don't know how the monitor order in interpreter frames can be guaranteed. I understand that the bytecode is checked by ciMethod::has_balanced_monitors(), but the actual values in the interpreter frame may be modified when we reach the OSR entry.
24-10-2023

In the PR [~mdoerr] showed how C2 is reversing the order. I checked C1 and it looks correct, and comments in the code confirm the correct order: 3217 // Similarly with locks. The first lock slot in the osr buffer is the nth lock 3218 // from the interpreter frame, the nth lock slot in the osr buffer is 0th lock 3219 // in the interpreter frame (the method lock if a sync method)
30-09-2023

I don't see anything wrong with Parse::load_interpreter_state or how it could do anything different. It trusts the order it was given from SharedRuntime::OSR_migration_begin, which also seems to be correctly preserving the order from the interpreter frame.
29-09-2023

AFAICS C2 doesn't reorder, but transfer them incorrectly from the interpreter state (Parse::load_interpreter_state) when doing OSR compilation. At least the test passes and I couldn't see any other problem when changing the order there (see comment in PR). I was also wondering if jdi/StepEvent somehow messes with the locks, but haven't found anything, yet.
29-09-2023

C2 should never reorder unlocks! That would be very broken.
29-09-2023

It makes me wonder if there is an ordering dependency missing somewhere in the resulting ideal graph, or why else would C2 emit the unlocks in wrong order?
28-09-2023

Very interesting! Yes, fixing that problem at its root would be best! Thanks for the investigation!
28-09-2023

After more investigation, I figured out that the JIT compilers reject methods with unbalanced locks even for OSR (with a conservative approach: if they can't prove that they are balanced, ciMethod ->has_balanced_monitors()): COMPILE SKIPPED: cannot parse method: not compilable (unbalanced monitors) So, seems like you're right. If so, we should also be able to fix the problem by restoring the correct order in Parse::load_interpreter_state when transferring the locks from the interpreter.
28-09-2023

I'm pretty sure that the JIT compilers should not emit unlocks in the wrong order. If that really happens, then it has nothing to do with lightweight locking and would be a pre-existing problem. Also, I would definitely expect that it would show up on other arches and with the old stack-locking too. That said, I guess there is a chance that old stack-locking would work by chance, simply because it does not check the order (this could be verified by adding such a check, I suppose). If that really is the case, I suppose we could remove the check as well. It would be ugly, though.
28-09-2023

C2 OSR: The locks are transferred from the interpreter state in Parse::load_interpreter_state. They have no guaranteed order. I believe it is generally possible to break the invariant with jasm, too: - lock a - lock b - loop which triggers OSR - unlock a - unlock b I think extending https://github.com/rkennke/jdk/blob/3f0acba4305cd1e8741586bbb0131cea26e1d10d/test/hotspot/jtreg/runtime/locking/TestUnstructuredLocking.jasm like this would be an interesting experiment.
28-09-2023

I've updated my PR to allow reordering for OSR compilations only. (Currently only for PPC64, but I'll work on x86_64 and aarch64, too.) Please take a look.
28-09-2023

Ok, so remove all the checks, maybe add a comment in their places, and that should fix the problem?
28-09-2023

How should a JIT compiler know when doing OSR? It doesn't even parse the parts which contain the monitorenter instructions if they happen before the OSR entry point. And it can't know the path which has been taken to reach the OSR entry point. Other locking modes don't care about the order and that's why it simply works. So, I guess it's a pre-existing problem which could get ignored without the new lightweight locking.
28-09-2023

I’ve attached an hs_err file with the disassembled OSR c2 nmethod nsk.share.jdi.EventHandler.run()V. I can see two unlock sequences: 0x00007fff706bda8c: The asm assertion which fires belongs to this one. It tries to unlock object from R18 which is Collections$SynchronizedRandomAccessList. I think it should unlock the EventHandler (see my previous comment). 0x00007fff706bdb88: Next unlock follows immediately. I think this one should actually unlock the Collections$SynchronizedRandomAccessList object. Do you know how "strictly nested balanced locking" is ensured for OSR methods? Note that, in general, the interpreter can lock objects which get unlocked in the compiled part of the method. About CR0 usage: We have changed the locking/unlocking to use only CR0 when implementing the new lightweight locking. The lock/unlock functions still support using other Condition Registers for other locking modes, but that is no longer needed. We want to clean this up when removing support for the old fast locking. It’s a bit confusing, now. I agree. Note that instructions with "_" suffix ("." suffix in disassembled code) set CR0. I'll try to reproduce it and dump IR graphs when I find more time. That might shed more light into the problem.
27-09-2023

Can you check the C2 generated code and see the wrong ordering there, too? If this were a C2 bug, it would certainly show up in other arches too. As long as I don't see evidence of that, I'm assuming it's a PPC specific problem. I've never seen C2 reorder locks that way. If I were you, I'd check if every way out of the C2 generated lock and unlock nodes has its condition flags transferred correctly. If the locking or unlocking encounters a failure, but publishes EQ/zero condition flag on the way out, the surrounding code would not call into the runtime to fix it, and leave the object on the stack. I've had this symptom a couple of times when I got the conditions wrong. I find the PPC asm code confusing because sometimes it seems to use CCR0 directly, other times it doesn't (e.g. does the CAS also use/update CCR0?). Also, the MacroAssembler::lightweight_(un)lock() routines seem to implicitely use CCR0, but it's not passed into them as argument, etc. It's all not very clear to me.
26-09-2023

Thanks for your explanation, Roman! About sharedRuntime_ppc.cpp: PPC64 (and s390) use compiler_fast_lock_object (and _unlock_) in shared runtime. So, they reuse the code from C2 which contains the lightweight locking. That looks all correct to me, now. I've taken a closer look at the C2 compiled OSR nmethod: nsk.share.jdi.EventHandler.run() It tries to unlock in the wrong order. This is what should happen (in this order): EventHandler.java:193 Should unlock EventHandler EventHandler.java:194 Should unlock Collections$SynchronizedRandomAccessList I guess that the "strictly nested balanced locking" is only checked during parsing. However, it seems to be possible that they get reordered somehow. We hit the assert because the nmethod tries to unlock the second one first. Could this be a C2 bug?
26-09-2023

ILW = HML = P2
26-09-2023

BTW, in the PPC implementation: https://github.com/openjdk/jdk/pull/14069 I don't see a change to locking in sharedRuntime_ppc.cpp, which matches other arches: https://github.com/openjdk/jdk/pull/10907/files#diff-524c9e019cb83916aa3db772fb33acbbe3e7465867a8d2f7e6376be3c8260edd Is it not necessary to deal with sync'ed native methods in PPC?
26-09-2023

C2 (and C1) enforces strictly nested balanced locking. Bytecode that does otherwise will be rejected and not compiled at all (doesn't happen with javac compiled code, afaik). The spec allows for more relaxed locking bytecode, which is why we check for TOS being different than the unlocked object in the interpreter, and dive into runtime to handle it there, and this is why we have the LockStack::remove() there, instead of simple pop(). As I said, javac doesn't produce such bytecode, but I have constructed a jtreg testcase which tests that situation, but I just realized that this test hasn't made it into the final PR: https://github.com/rkennke/jdk/blob/3f0acba4305cd1e8741586bbb0131cea26e1d10d/test/hotspot/jtreg/runtime/locking/TestUnstructuredLocking.jasm It might be worth to add it now.
26-09-2023

Does C2 enforce strictly nested balanced locking i.e. only methods that unlock in the reverse order of locking are compiled? In general the locks/unlocks in a method must be balanced, but they can occur out-of-order (lock A, lock B, unlock A, unlock B) - hence the general unlock case does a remove() not a pop().
26-09-2023

The issue can be reproduced with the code which handles ANONYMOUS_OWNER in the C2 monitor unlocking code. That is not the problem. I'm convinced that the slow path handles that correctly. The issue only occurs in jdi tests. Seems like they are able to break the invariant that always the top of stack gets unlocked. Do you know why the slow path uses LockStack::remove (instead of pop) which is able to unlock any object from the stack? Seems like similar problems were resolved there, but not in the fast unlock code.
25-09-2023

The PPC64 implementation jumps to the slow path if the owner is ANONYMOUS_OWNER because ANONYMOUS_OWNER never matches the current thread. Does the slow path not handle that? Is it required to do it in assembler code? I already had code to handle it, but during review it was suggested to remove it: https://github.com/openjdk/jdk/pull/14069/commits/8b4c8370c9b953ccee1036c97afa80a1621ae199 I'll add it back and try to reproduce the issue.
25-09-2023

The code-path that handles anonymous owners looks different in PPC vs other aarches. I am not too familiar with PPC asm, but it doesn't look obviously correct to me. Compare: https://github.com/openjdk/jdk/blob/0f77d250b67ae0678756f986607eb239641dfb9e/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp#L199 vs. https://github.com/openjdk/jdk/blob/0f77d250b67ae0678756f986607eb239641dfb9e/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp#L2347 In particular, I don't see the PPC equivalent of the actual ANONYMOUS_OWNER test which should look like tst(tmp2, (uint64_t)ObjectMonitor::ANONYMOUS_OWNER); IOW, I would expect some sort of test or and instructions with ANONYMOUS_OWNER in it somewhere.
25-09-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/15903 Date: 2023-09-25 13:57:26 +0000
25-09-2023