The reachabilityFence should be used when a DirectByteBuffer may be deallocated by a Cleaner. More generally, if an off-heap resource is being used after pulling its identity out of a handle object (like a DBB), and that object has a cleaner which deallocates the resource, then any code which works on that resource, even for a short moment, needs to ensure that the handle object is reachable until after the work is done.
(Otherwise there may be a race condition where the resource-using thread blocks just before performing an operation, but after finishing with the handle object. At that point, if the GC runs, and the handle object goes dead, a cleaner or finalizer may successfully deallocate the native resource, before the original thread gets a chance to resume and finish its use of that resource. Result: A rare dangling pointer bug.)
For DBB and similar data structures (e.g., in Project Panama) such fences will need to be placed at the end of many low-level accessors, typically just after a call to Unsafe.getInt or the like. Since those accessors can optimize down single instructions, it is important that a reachability fence optimize (on average) to *zero* instructions. This can be done if the JIT takes appropriate note of the fences and creates IR nodes which seem to use the object but generate no code. Or, the out-of-line call to the runtime support routine can simply be "zapped" by a late edit in the backend, after scheduling is done and oop maps are computed.
It may also be useful to "common up" similar fences. Particularly, if F1 and F2 fence the same reference, and F2 post-dominates F1, then F1 can be elided.
It may also be useful to allow RFs to drop down out of a hot loop, simply to decrease the volume of IR in the loop, and remove a barrier to reordering within the loop. This should make it easier, for example, to vectorize a bulk operation on a DirectByteBuffer.
(Note that, like other fences, a RF cannot be reordered with other operations. So it probably needs to be a "pinned" node, with control input.)
In any case, the optimal code for a non-elided RF is simply an entry in an oop map, saying that the register or stack slot containing the protected reference is still live, as of the execution point of the RF. Zero executable code is needed or desired. The worst-case cost of this intrinsic should be a spill of the still-alive protected reference to stack.
See https://bugs.openjdk.java.net/browse/JDK-8149610