The recent aarch64 implementation uses "/*acquire*/ true, /*release*/ true" semantics in both, MacroAssembler::lightweight_lock and MacroAssembler::lightweight_unlock.
For locking purposes, it is sufficient to use lock-acquire and release-unlock. Especially the lock operation does not need to release anything like the displaced header in legacy locking mode.
Regarding the memory model, https://github.com/openjdk/jcstress/blob/ce8a2c7747e0232fac66783c89ea4a58a3819e0f/jcstress-samples/src/main/java/org/openjdk/jcstress/samples/jmm/advanced/AdvancedJMM_01_SynchronizedBarriers.java#L56 and other similar tests indicate that the JMM doesn't have additional requirements.
We should check if weaker semantics are correct and improve performance.