Bug ID: JDK-8290892 C2: Intrinsify Reference.reachabilityFence

JDK 26
26Unresolved

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/25315 Date: 2025-05-20 00:49:49 +0000
20-05-2025
I attached a test that demonstrates the issue with C2: java --add-opens java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/jdk.internal.misc=ALL-UNNAMED -XX:CompileCommand=compileonly,TestReachabilityFenceWithBuffer::test -Xbatch TestReachabilityFenceWithBuffer.java CompileCommand: compileonly TestReachabilityFenceWithBuffer.test bool compileonly = true Created new buffer of size = 1000 with id = 0 Buffer set to null. Waiting for garbage collection. Freeing buffer with id = 0 Exception in thread "main" java.lang.RuntimeException: Unexpected value = 0. Buffer was garbage collected before reachabilityFence was reached! at TestReachabilityFenceWithBuffer.test(TestReachabilityFenceWithBuffer.java:94) at TestReachabilityFenceWithBuffer.main(TestReachabilityFenceWithBuffer.java:126) The test works fine with the interpreter (-Xint).
27-08-2024
This issue came up here: https://github.com/openjdk/jdk/pull/20158#issuecomment-2228916752 The fact that we can't rely on "reachable" objects to be kept in oop maps when loops are unrolled force FFM API to perform some conservative deoptimization, which hurts performance of threads that are accessing memory segment backed by an unrelated arena.
17-07-2024
[~dlong] yes, ReachabilityFence node does do similar tricks Blackhole is equipped with, but also enables additional optimizations on top. There's some conceptual difference between 2 use cases, but it doesn't mean their implementation can't be commoned in the future.
28-06-2023
Is your solution at all similar to the _blackhole instrinsic and BlackholeNode?
14-06-2023
Preliminary intrinsic-based solution: https://github.com/iwanowww/jdk/tree/reachability The idea is to keep a temporary node at the place where RF call is placed which keeps the relevant OOP alive, so it is recorded in OOP maps of intermediate safepoints. Current implementation keeps RF node across all phases. Alternatively, RF may be eliminated before lowering ideal grahp once the OOP is linked to all intermediate safepoints (e.g., performed as part of IGV and loop opts). TODO: * Introduce stress testing mode: automatically inject redundant RF nodes in C2 IR. In addition to testing robustness, additional performance testing should reveal broken optimizations/missed optimization opportunities. (The goal of RF intrinsic is to be as low overhead as possible.) * Conservatively push RF as far as possible down thus broadening OOP live range and minimizing the chances the object dies before all relevant operations are performed. (It can be implemented as a follow-up enhancement. The current fix is robust enough and covers the scenario described in the bug.)
10-06-2023
I must confess I am struggling to understand things here. The basis of the fix for JDK-8199462 was that C2's liveness analysis is based on bytecode analysis and so it was sufficient for the RF call to be present in the bytecode. The fact that the RF method actually did nothing is/was immaterial as would be its elision in the generated JIT code. It is the presence of the RF in the bytecode that counts, not its execution. RF is a marker that says "keep this object alive". Now I can imagine that other C2 optimisations may potentially affect the original liveness analysis and decide the kept-alive reference does not actually need to be kept-alive, so if that is the case then we need a way to tell C2 we really do mean for it to be kept-alive. I look forward to hearing about the proposed fix. I just hope I can understand it.
26-05-2023
I think JDK-8199462 is a bad fix that caused this bug in the RF method (Reference::reachabilityFence). There was a very specific reason to mark the RF method DontInline, and inverting that decision to ForceInline destroyed that reason. That would be OK if there is some other code generation action taken in the JIT to compensate, but there is no such action (that I can see). Instead, it seems we are just crossing our fingers that a JIT optimization will not spoil our day. The deal with reachability and JITs is that an object is reachable from JIT code only if a GC map for a stack frame tells the GC where to find that object’s reference, in the stack frame. GC maps are generally tied to out-of-line calls, which is why DontInline is a perfect fit to the problem: The JIT probably knows that it has to preserve the outgoing reference argument to the out-of-line call, because the JIT probably assumes that the out-of-line call might do something with the incoming reference parameter, such as store it to a global variable. (I said “probably”, because C1 and C2 both assume minimum knowledge about out-of-line calls today, but they might not always.) It is important to remember that GC maps are tied to all kinds of safepoints. (Calls are a kind of safepoint, BTW.) The JIT is free to remove entries from GC maps if it can prove that those entries correspond to values which will never be used, as of some safepoint. (See uses of GraphKit::kill_dead_locals in C2 for an example.) So, if the RF method is inlined, the JIT might notice there are no live uses of the argument, and so it might be unhooked from any nearby safepoint. Optimizing JITs are good at noticing such things. Even if the RF method is called out of line, a heroic JIT might make the argument go dead and drop it from the GC map. For example, we could use something like the escape analysis method summarizer logic to note that, since RF has an empty body, the argument is not used, and thus need not be stacked. Since it’s a static method, a customized calling sequence could just omit the argument, which then would not be in the oop map. Such a hypothetical optimization would have to allow for modes (such as debugger requests) where execution might transition into the interpreter, in which case the missing argument would have to be reconstituted (as null, if there were no live value). I’m going at length on this not because it’s a real feature, but to give folks a sense of what sorts of JVM features might possibly invalidate the reasoning behind the DontInline tactic. Now RF is ForceInline, so it’s inlined all the time; thus it has become a NOP in optimized code. That’s great for performance but it kicks correctness to the curb since the JIT is very likely to call kill_dead_locals and clean out relevant GC maps. Let’s fix that ASAP before we get crashers in the field due to well-optimized JIT code falling afoul of this missing functionality. Crashes are more and more likely as users start to embrace native programming APIs like Panama. Here’s a better fix, more robust than just the DontInline tactic alone, and able to get most or all of the desired performance, without destroying correctness. It is simple also. First, restore the DontInline marker. This gets correctness back, for all JITs. In particular, that is all that C1 needs to recover correctness, since C1 is the “less heroic” of the JITs. Second, make the RF method be an intrinsic, a hardwired method like Object::hashCode and Object::getName. Realistically, if we need specialized behavior like fencing from a method, either it is an intrinsic (or JNI) or it calls an intrinsic (or JNI). The easy implementation is just make RF an intrinsic. The nice thing about it being an intrinsic is, combined with the DontInline marker, JITs know what they are dealing with, and we can get correct results even as JIT optimizations change over time. To recover performance, in C2 and maybe C1 and Graal, treat that intrinsic specially. Replace the call instruction by a plain safepoint, and make sure the safepoint is hooked to the argument. Already, JITs routinely insert similar plain safepoints in loops; this is another use case for such non-call safepoints. Take a moment to recheck the JIT’s logic for safepoint elision and safepoint merging, looking for cases where the hooked reference would be dropped. Make sure it is still hooked in the transformed IR (which has fewer safepoints). Perhaps add a boolean to safepoints to say “this safepoint has a preserved reference, so handle with care”, or some similar expedient to ensure the hooked reference is merged away. The presence of the safepoint in the IR may possibly deter a few optimizations, but the effect will be much, much smaller than a real out-of-line call, which spills all the registers. The risk level for this bug is medium to low, since we do not think it has bitten us very often. But, conservatively, it should be higher than medium or low, because its failure mode will include crashes which will be very hard to diagnose. The serverity level should also be raised to reflect, not only that rare crashes might happen, but also those crashes are not likely to be diagnosed correctly. The crashes will appear to be the fault of whatever upgraded JIT optimizations change machine code shape, enough to open up the timing window for the bug to manifest. The better the optimization quality and/or the more prompt the execution of cleaners or other GC features, the more likely the timing window will open. ADDED COMMENT: kill_dead_locals is an example of how C2 makes references go dead, but by itself it will never kill the RF argument, since it operates at a bytecode level, and will always preserve the argument for the invokestatic bytecode that invokes RF. Other mechanisms can potentially make references go dead, notably safepoint elision or safepoint merging. If the safepoint for the RF call (the invokestatic) is removed for any reason, then previous safepoints will retain a reference (and a GC map entry) for the argument (because kill_dead_locals didn’t kill it), but after the last previous safepoint there will be no record of the reference, and so there can be a span of safepoint-free code (including memory reads and writes) which do not record a live reference to the argument. ADDED COMMENT #2: The owner of this bug (Vladimir I) has discussed his draft fix with me. It meets and beats my expectations for what a good fix will look like.
26-05-2023
Yes, that is exactly what I mean. Having said that, we now use reachabilityFence in very performance critical places so I don’t think we should stop inlining. But we might want to have a new IR node that more explicitly solves the problem, or something like that.
23-07-2022
ILW = object possibly finalized too early; under certain code patterns and compiler optimizations; move reachabilityFence to end of method? = HLL = P4
22-07-2022
So you're saying the logic used to justify JDK-8199462 is flawed, and the reachabilityFence(bar) does not keep bar alive during the doStuff() call?
22-07-2022

Relates :	JDK-8199462 - Use Reference.reachabilityFence in direct ByteBuffer methods
Relates :	JDK-8335480 - Only deoptimize threads if needed when closing shared arena
Relates :	JDK-8169605 - reachabilityFence needs to be optimized away by the JIT