JDK-8276852 : x86_64 fast_unlock C2 intrinsic handles racy successor incorrectly
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 18
  • Priority: P3
  • Status: Closed
  • Resolution: Not an Issue
  • CPU: x86_64
  • Submitted: 2021-11-09
  • Updated: 2021-11-09
  • Resolved: 2021-11-09
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 18
18Resolved
Related Reports
Relates :  
Description
In the x86_64 C2 fast_unlock intrinsic, we correctly check if _succ was set right after storing NULL to the ObjectMonitor owner field. If so, we have to try taking the lock. If we succeed, we should enter the slow path and deal with poking the successor, and if we fail to re-take the lock, we should *not* enter the slow path, because another thread beat us to it and will poke the successor. This is where the issue is.

This is the code that tries taking the lock again:
lock();
cmpxchgptr(r15_thread, Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)));
jccb  (Assembler::notEqual, LSuccess);

And this is the success label (which is designed to let us pass the fast path):
bind  (LSuccess);
testl (boxReg, 0);                      // set ICC.ZF=1 to indicate success

For this to actually set ZF, as the comment implies, the contents of boxReg must be zero. But in this particlar case when re-taking the lock in a failing way, it won't be. Because boxReg is always rax, and rax is used by the CAS to store the previous value of the CAS operation. And when the CAS failed, it's always because there is a non-zero value in memory. So when we get to the LSuccess label, we will *not* set ZF. Therefore, we will instead incorrectly take the slow path, even though we should have taken the fast path (which is indeed the contract for the LSuccess label).

When we incorrectly enter the slow path, it is expected that we own the lock. And if we do not, there is an assert(false), and a bunch of error logging saying that we must have hit unbalanced JNI locking. While in fact, this only happened because of incorrect assembly code.
Comments
Patricio correctly pointed out that testl boxReg, 0 is invariant of the contents of boxReg. So the rest of the reasoning here is invalid. I will close the bug.
09-11-2021

Following the code paths of what would happen when this occurs in a product build, it seems like this one will only result in some weird nonsense error logging that we wouldn't normally print, in terms of actual difference in behaviour. But I don't think it will cause stalls or crashes in product builds. I think.
09-11-2021

ILW = Same as JDK-8276851 but on x86_64 and with different manifestation, never observed yet = P3
09-11-2021