Bug ID: JDK-8276851 [AArch64] Improve fast

This was fixed by JDK-8320318 so therefore it's no longer an issue.
02-10-2024
Back to you Fredrik.
31-07-2024
This is related to a lot of runtime code that we're working on. Can I reassign this to runtime subcomponent?
06-03-2024
I added a 'sync' keyword used by the runtime team so I can find this with the rest of these issues.
17-11-2023
So the X64 code that we're talking about is here: src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp: #else // _LP64 // It's inflated Label CheckSucc, LNotRecursive, LSuccess, LGoSlowPath; cmpptr(Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(recursions)), 0); jccb(Assembler::equal, LNotRecursive); // Recursive inflated unlock decq(Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(recursions))); jmpb(LSuccess); bind(LNotRecursive); movptr(boxReg, Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(cxq))); orptr(boxReg, Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(EntryList))); jccb (Assembler::notZero, CheckSucc); // Without cast to int32_t this style of movptr will destroy r10 which is typically obj. movptr(Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)), NULL_WORD); jmpb (DONE_LABEL); // Try to avoid passing control into the slow_path ... bind (CheckSucc); // The following optional optimization can be elided if necessary // Effectively: if (succ == null) goto slow path // The code reduces the window for a race, however, // and thus benefits performance. cmpptr(Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(succ)), NULL_WORD); jccb (Assembler::zero, LGoSlowPath); xorptr(boxReg, boxReg); // Without cast to int32_t this style of movptr will destroy r10 which is typically obj. movptr(Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)), NULL_WORD); // Memory barrier/fence // Dekker pivot point -- fulcrum : ST Owner; MEMBAR; LD Succ // Instead of MFENCE we use a dummy locked add of 0 to the top-of-stack. // This is faster on Nehalem and AMD Shanghai/Barcelona. // See https://blogs.oracle.com/dave/entry/instruction_selection_for_volatile_fences // We might also restructure (ST Owner=0;barrier;LD _Succ) to // (mov box,0; xchgq box, &m->Owner; LD _succ) . lock(); addl(Address(rsp, 0), 0); cmpptr(Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(succ)), NULL_WORD); jccb (Assembler::notZero, LSuccess); // Rare inopportune interleaving - race. // The successor vanished in the small window above. // The lock is contended -- (cxq\|EntryList) != null -- and there's no apparent successor. // We need to ensure progress and succession. // Try to reacquire the lock. // If that fails then the new owner is responsible for succession and this // thread needs to take no further action and can exit via the fast path (success). // If the re-acquire succeeds then pass control into the slow path. // As implemented, this latter mode is horrible because we generated more // coherence traffic on the lock and artificially extended the critical section // length while by virtue of passing control into the slow path. // box is really RAX -- the following CMPXCHG depends on that binding // cmpxchg R,[M] is equivalent to rax = CAS(M,rax,R) lock(); cmpxchgptr(r15_thread, Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner))); // There's no successor so we tried to regrab the lock. // If that didn't work, then another thread grabbed the // lock so we're done (and exit was a success). jccb (Assembler::notEqual, LSuccess); // Intentional fall-through into slow path bind (LGoSlowPath); orl (boxReg, 1); // set ICC.ZF=0 to indicate failure jmpb (DONE_LABEL); bind (LSuccess); testl (boxReg, 0); // set ICC.ZF=1 to indicate success jmpb (DONE_LABEL); #endif The end of fast_unlock() in c2_MacroAssembler_aarch64.cpp does not have all this X64 logic that's checking the successor field. (Both have logic to check the queues.) One thing that stuck out when I just did a read thru were these two sections: // Without cast to int32_t this style of movptr will destroy r10 which is typically obj. movptr(Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)), NULL_WORD); and: // Without cast to int32_t this style of movptr will destroy r10 which is typically obj. movptr(Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)), NULL_WORD); I suspect the casts were removed as part of the NULL -> NULL_WORD work so now I'm wondering if we still destroy r10 (and thus obj)...
17-10-2023
I agree with the assessment. Should be an RFE to better support the successor protocol to reduce bounded but possilby long spurious stalls. Or something like that.
09-03-2022
It looks to me like x86 (#ifndef _LP64 section in c2_MacroAssembler_x86.cpp) is doing the same as aarch64. [~eosterlund], based on Patric's evaluation, do you still consider this a bug, or should it be an RFE with a less dangerous sounding synopsis?
09-03-2022
After some thought and discussion with Erik (Erik also consulting D.Dice), it is apparent that there might be additional considerations to be made. While we could change the Aarch64 code to double check the successor (_succ), it does not seem to be necessary (). There are two (optimising) mechanisms at play here, the successor protocol and the responsible (_Responsible) ditto. The responsible protocol will ensure progress even in cases when the successor ditto fails to handle races. Both attempting to improve certain situations. However, both these protocols are attempting to improve on situations when spin-locks have failed, trying to "optimise" a situation when congestion is already a fact. Fast unlocking _could_ be as simple as to only make a check on the lock queue and if no queue is present, release the lock (fast path) in JIT code, otherwise handle queue and wakeup (slow path) through VM code. Q1. Would the simplified approach (above) be too slow a path in the congested scenario? Do we know? Q2. Should we try to seal the races in the successor protocol, thus making the responsible protocol superfluous? Q3. Should we rely on the responsible protocol alone, adjusting the back-off to a regular exponential approach, thus allowing us to remove the successor protocol altogether? () The same seems to be the case for PPC and S390.
11-01-2022
No reproducer and no "sane" way to test change, risk to high for JDK 18 (RDP1). Moving to JDK 19.
16-12-2021
Now, where's that reproducer... ;-)
15-11-2021
Sorry Patric! ;-)
12-11-2021
ILW = Possible stall of application, never observed yet, possibly disable fast_unlock intrinsic = HLM = P3
09-11-2021

Relates :	JDK-8320318 - ObjectMonitor Responsible thread
Relates :	JDK-8276852 - x86_64 fast_unlock C2 intrinsic handles racy successor incorrectly