In the x86_64 C2 fast_unlock intrinsic, we correctly check if _succ was set right after storing NULL to the ObjectMonitor owner field. If so, we have to try taking the lock. If we succeed, we should enter the slow path and deal with poking the successor, and if we fail to re-take the lock, we should *not* enter the slow path, because another thread beat us to it and will poke the successor. This is where the issue is.
This is the code that tries taking the lock again:
lock();
cmpxchgptr(r15_thread, Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)));
jccb (Assembler::notEqual, LSuccess);
And this is the success label (which is designed to let us pass the fast path):
bind (LSuccess);
testl (boxReg, 0); // set ICC.ZF=1 to indicate success
For this to actually set ZF, as the comment implies, the contents of boxReg must be zero. But in this particlar case when re-taking the lock in a failing way, it won't be. Because boxReg is always rax, and rax is used by the CAS to store the previous value of the CAS operation. And when the CAS failed, it's always because there is a non-zero value in memory. So when we get to the LSuccess label, we will *not* set ZF. Therefore, we will instead incorrectly take the slow path, even though we should have taken the fast path (which is indeed the contract for the LSuccess label).
When we incorrectly enter the slow path, it is expected that we own the lock. And if we do not, there is an assert(false), and a bunch of error logging saying that we must have hit unbalanced JNI locking. While in fact, this only happened because of incorrect assembly code.