While I was cleaning up the patch for 8216350 I noticed an issue in the implementation of recursive locking in aarch64_enc_fast_lock:
First we load the markOop of the object we want to lock and OR it with markOopDesc::unlocked_value (1). Then we do a CAS to exchange the address of the box on our thread's stack with the object's header word iff it's equal to the (markOop | 1) we just computed. If this fails, then we should check for a recursive lock by comparing
(~(page size - 1) | 3) & (markOop - SP) == 0
Where "markOop" is the current object header word loaded by the failed CAS. This checks that the lock bits are zero (locked) and the stack address of the displaced header is within one page of the current SP. But on AArch64 we actually do this:
(~(page size - 1) | 3) & ((old markOop | 1) - SP) == 0
Where "old markOop | 1" is the compare-to value used for the CAS. This is always false as the result has at least bit #0 set. This only affects C2, the c1_MacroAssembler version has the correct test.