JDK-8233873 : final field values should be trusted as constant
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 14
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2019-11-08
  • Updated: 2023-11-06
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
# Problem

The JVM JITs routinely optimize references to final fields as constant
values, when a JIT can deduce a constant containing object.  This is a
fundamental capability for producing good code.

Currently, though, only a small number of "white listed" fields are
treated in this way, since vigorously optimizing _all_ final fields is
thought to have unknown risky consequences.  The white listing logic
is defined using the function `trust_final_non_static_fields` and
similar logic as part of changes like JDK-6912065 and JDK-8140483.

# Proposal

The JVM should support an option `FoldConstantFields` which treats
bypasses the above "white list" and uses a "black list" instead as
needed.  Initially this option should be turned off by default.
Turning it on should, initially, also turn on a new option
`VerifyConstantFields` which detects updates to final fields and
diagnoses them with some selectable mix of warnings or errors.

(See below for discussion of how updates to final fields can occcur.
The short summary is "reflection, JNI, or Unsafe". Each of these
requires a different remediation.)

This feature will not solve the problem of full optimization of
constant fields all at once, but will set the stage for finding and
fixing problems caused by such optimizations.

The support for `FoldConstantFields` should include (either initially
or as follow-on work) the following functions:

 - Dependency recording in the JIT, whenever a final field value is
   used.  At first this should be recorded per field declaration, not
   per individual field instance, on the assumption that invalidation
   will be very rare.  This assumption may need to be revised.

 - Updates to final fields via reflection must be trapped and must
   trigger deoptimization of dependent JIT.

 - Updates to final fields via JNI must be trapped similarly.

 - Updates to final fields via other users of `Unsafe` must be trapped
   similarly.  This addresses uses of `Unsafe` _that the JDK knows
   about and controls_.

 - Encourage other users of `Unsafe` to perform similar notifications,
   and document how to do so.  Perhaps there are additional `Unsafe`
   API points to notify the JIT.

 - Placing the checking logic inside `Unsafe` is the wrong answer in
   most cases, since it would penalize well-behaved users of `Unsafe`.
   Perhaps a separate flag `VerifyUnsafeUpdates` would be applicable,
   for stress tests where performance can be sacrificed.

 - Define an API for use by privileged frameworks (including those in
   the JDK) for creating objects in a "larval" state, apart from
   normal constructor invocation.  (Possibly `Unsafe.allocateInstance`
   is such an API point; see also JNI AllocObject.)  These are
   released from the constraints on final field writing, including JIT
   invalidation.  If a JIT encounters an object in the larval state,
   the JIT will simply refrain from constant-folding its fields.

 - Define an API for promoting larval objects to a normal "adult"
   state, at which point the normal JIT optimizations would apply.  If
   this isn't done, performance will be lost only regarding the larval
   objects created by old frameworks, so perhaps this isn't needed.

 - It seems likely that the larval and adult states would need to be
   reflected in a bit pattern in the object header.  As an
   optimization, normally constructed objects would probably not need
   to have this state change in their header bits, unless perhaps they
   "escape" during their constructor call.

# Discussion

A final field can in some cases be assigned a new value.  If a JIT has
already observed the previous value of that final field, and
incorporated it into object code as a constant, then (after the
assignment of a new value to that field), the optimized object code
will execute wrongly.  We call such wrongly executing code "invalid",
and the JVM takes great care to avoid executing invalid code in
similar cases involving speculative optimizations, such as
devirtualized method calls or uncommon traps.

The basic reason for this is that the Java Memory Model requires that
all fields (including changed final fields) must be read accurately.
An accurate read yields a value that is appropriate to the current
thread, as defined by a web of "happens-before" relations.  (It is not
entirely wrong to think of these relations as a linear set, although
concurrency and races are also part of the JMM.)

But final fields _must_ be changed when an object is initialized, and
_may rarely_ change in other circumstances.  There are a number of
ways to change the current value of a final field:

0. In a constructor, a final field may be changed from its current
value (typically initial default value) to a new (possibly
non-default) value.  The JVM (per specification) allows this to occur
_multiple times_ although most sources of bytecode are thought to
avoid such behavior.

1. When a field is reflected, and `setAccessible(true)` is called, the
value may be set.  This "hook" is intended for use by deserializers
and other low-level facilities.  It is thought to be used as a
simulation of case #0 above, when an object's constructor cannot be
conveniently invoked. In a real sense, holding this option open for
serialization frameworks harms the optimization of the entire
ecosystem.

2. JNI functions such as SetBooleanField can be used to smash new
values into fields even if they are final.

3. Good old `Unsafe.setInt` can be also be used to smash new values
into fields (or parts of fields or groups of fields) even if they are
final.

Although a debugger can forcibly change the value of a field from
outside the JVM, via APIs in the `jdk.jdi` module, it appears to be
impossible to use those APIs to change final fields.

It is unknown what libraries or bytecode spinners "in the wild" are
using any of the four options above in ways that would invalidate
JIT-compiled code.  Setting the JITs free to optimize fully requires a
plan for mitigating the impact of final field changes both in known
code (in the JDK) and in unknown "wild" code.

## Side note on races

Although race conditions (on non-volatile fields) allow the JVM some
latitute to return "stale" values for field references, such latitude
would usually be quite narrow, since an execution of the invalid
optimized method is likely to occur downstream of the invalidating
field update (as determined by the happens-before relation of the
JMM).  The JMM itself would have to be updated to either relax
happens-before relations pertaining to final field updates, or else
allow special race conditions that allow the JIT to use stale values
of final fields (in effect, loo king backward in time, past events
visible through the relevant happens-before events).  There are no
active proposals to update the JMM in this way, and it seems easier to
take the JMM as a given, or (at most) make very small changes to it to
further specialize the treatment of final fields.

Comments
Yeah my focus is definitely constant folding in particular. Allowing non-static finals to be used as constants in the JIT. From a constant root, and allowing constant chains to be followed around in the heap. Hoisting of any non-volatile fields outside of loops, whether declared final or not, is already fully valid and actively done. They only have to be re-evaluated if you have a call to a method that was not inlined. And that might be a sign this path isn't super hot and hence might not be worth optimizing s much further. Otherwise we would probably have inlined it. As for spec changes, I'm not sure I follow. The only possible backward compatibility problem I can see, is with explicit unsafe code doing unsafe things (using unsafe to rewrite finals). But Unsafe is for internal use only, and does not have any real specification. We would merely have to write a comment saying that if you rewrite finals with unsafe, then call you need t call this new function to make it known what you have done to the VM, so it can take appropriate action. And make sure the JDK (e.g. reflection) follows that practice. I don't think we need a better contract, as Unsafe is really not supposed to be used from the outside, and if you do you have always been on your own, to do the right thing, playing along in the game of thinking you know what you are doing.
24-07-2021

Erik wrote: > I presume aRef should be final in the example… No. It is intentionally not final in that example. The intent of the code above is to provide a “perfectly valid” (tho ugly) example where “final long id” is not truly final, even tho it is never set outside the constructor, because instances of SimpleClass are exposed mid-construction (via the non final aRef) to other code which may be run by other threads. That other code (doStuffToARef) uses a valid (but ugly) means of dealing with the raciness and expects correct behavior that would be based on the eventual, fully constructed state of the SimpleState instance, including having its id field set to its eventual and final value. But if a JIT optimized doStuffToARef() with an assumption that the id field is truly final, it would result in invalid execution (the while loop would never exit). I think we might be focusing of different scopes for the optimizations that result from truly final indications. Your focus might to be on being able to constant fold state that is statically reachable from static finals. And for any truly final reference involved in reaching that state, this can certainly be done. But it can (validly) be done only as long as the optimization doing it tracks the truly final assumption (in a way guaranteed to cover at least the instances involved) and e.g. deopts accordingly if that assumption is invalidated at a later time. Such an optimization would require interception of anything that may validly change the state of instance final fields, and some enforcement action (deopts, runtime check) based on that interception. [where “validly change” includes all the things John described in the initial posting] But with that interception and enforcement in place, there is a (much?) wider set of optimizations that become available on such speculatively-tracked truly-final fields (references or otherwise) in the much wider set of instances that may have nothing to do with static finals. E.g. the prototypical optimization I mentally examine things against is not the folding of statically known constant values, but the avoidance of re-reading the state of such final fields once it is established in a method. Such that e.g. reads of final fields can be hoisted out of loops, and e.g. range check elimination is enabled when looping on the contents of per-instance final “buffer” arrays (in the presence of potential ordering requirements within the loop). Constant folding is just a special case of this (where the state is established at JIT compile time). If enforcement applies to all instances of a class, there would be no additional enforcement needed for the set of instances statically reachable from static finals. A benefit of enforcement that is special to those limited sets of instances would only exist when breaking the truly final assumptions does happen to some instances of a class, but _NOT_ to the instances that are statically reachable. But establishing that fact safely seems “hard” without some per-instance state (in e.g. the markWord) that would facilitate either runtime checks of truly final validity of an instance or group of instances, or runtime detection (at rule-breaking interception points) that the specific instance involved has assumptions tied to it. IMO the performant way to deal with Unsafe and reflection is not to intercept the writes, but to instead intercept the (much less frequent) establishment of the ability to perform them (intercept at Unsafe.objectFieldOffset and at Field.setAccesible()), it would be “hard” to evaluate or set per-instance indicators at those interception points, which means that such interception will tend to apply to classes as a whole, and not to individual instances. *I think* that any path that would [eventually] prevent the need for such interception (at least for any classes visible to non-JDK-internal code) would [eventually] require both a spec change (for reflection) and a change to a Unsafe semantics. Am I missing something in that regard?
24-07-2021

I presume aRef should be final in the example. The compiler would follow aRef, and from there its id field. It checks the validity of each access. aRef is validated by checking the class is initialized, and the object is validated by issuing a thread-local handshake that scans all stacks to prove that no thread is inside of the constructor of that object. That proves that the object has finished construction, and we are past the window of racyness. From there, only exotic accesses with deopt hammers can mutate the state.
24-07-2021

@Erik You are right, my statement above should probably say "...when there is suspicion that the 'this' reference may have been published before the final field was assigned its *final* value...". I might be missing something in you thread-local-hanshake suggested mechanism as it relates to trusting the final nature of instance-final fields. How would that mechanism detect the following situation and avoid treating id as truly final in doStuffToARef? (treating it as truly final will lead to an infinite loop that would [presumably] not occur otherwise). class SimpleClass { static AtomicLong latestId = new AtomicLong(0); static AtomicLong spinCount = new AtomicLong(0); final long id; static SimpleClass aRef; void a () { aRef = this; } SimpleClass() { a(); id = latestId.incrementAndGet(); spinCount.incrementAndGet(); } static void doStuffToARef () { while (aRef.id == 0) { // wait for things to settle since this is racy spinCount.incrementAndGet(); } // Do stuff that depends on the value of id. } }
24-07-2021

It could even escape through a completely different agent thread, picking up the reference from a JVMTI heap walk. So we would need even more seat belts.
24-07-2021

Also, a static analysis would have to protect against the value escaping through more exotic ways, such as a virtual call that does not involve the this pointer at all, calls a future class that is not loaded yet, which uses one of the various stack walking APIs to acquire a reference to the this pointer and publishes it. It can become increasingly difficult to convince yourself that all present and future paths are covered. For normal escape analysis used for object scalarization, this is fine because the stack walker would simply materialize the object. But here, they need to know that acquiring any reference from the locals, might be a violation against some subtle static analysis constraint that we thought we proved. So there seems like there mist be more to it, given this category of solution.
24-07-2021

John, Erik, there are two good non-rhetorical questions above: 1. To the question of objects being published before the constructor is complete: This is straightforward to address by disabling Truly Final treatment for a final field when there is suspicion that the 'this' reference may have been published before the final field was assigned a value. This can be covered by static analysis, and the analysis can be arbitrarily conservative in situations where it does not want to track "too deep" into e.g. methods called from the constructor [when in doubt about whether or not 'this' has been published before the assignment, the field would not be treated as Truly Final]. The same logic, BTW, can be applied for Effectively Final analysis. 2. The question about mocking and serialization libraries: This is where reflection and Unsafe.objectFieldOffset can (and should) be made to differ in behavior. For any final fields whose offsets are requested (from non-JDK code) via Unsafe.objectFieldOffset, we would turn off Truly Final treatment. But normal reflective access (without the override of setAccesible) doesn't have that effect, and we avoid triggering by changing the reflection implementation use a separate ivm-internal variant of Unsafe.objectFieldOffset (which is safe to not trigger on because it is only used in read-only paths in reflection). We separately trigger on Field.setAccesible(true). For Libraries generally that use reflection unless they actually intend to modify the final field this loess nothing. And for those who do intend to modify the final field, we truly want to avoid thinking of the field as truly final...
24-07-2021

Regarding point 1; a static analysis that proves the this pointer isn’t published before a value is written to the final field, does not sound like it is enough. The JVMS allows multiple transient assignments during the execution of the constructor. So when you find a store to the final field, that might not be the last store that decides the value of the field when the construction is complete. I guess you end up with kind of an escape analysis, and end up relying on its ability to prove the lack of escapes of the this pointer in the constructor and its possible call tree. Now that sounds like it would be a valid solution. But I wonder how many opportunities would be lost because a) we couldn’t prove there is no escaping path, b) there is a theoretical escaping path that does not happen at runtime, or c) the object really does escape, but never in a racy way such that the optimization is invalid, or d) the this pointer is passed to a potentially future (but not presently) megamorphic dispatch, that presently won’t escape, but will potentially after the next class is loaded. Would we disallow the optimization whenever dynamic binding is involved in the constructor? My proposed solution doesn’t need a large bag of compiler machinery to conservatively, and likely inaccurately, guess if there could be an issue in theory. It simply checks instead with a good old lookup, if this race is actually happening right now or not. Granted, the check isn’t super cheap, but I think batching can solve that.
24-07-2021

Gil, you and I may differ on the importance of controlling the behavior of finals in the presence of races, and also on the role of third-party libraries. If an object is published from its constructor, and the JIT sees an uninitialized field, I think the folding of that field value as a constant would be an error. What would the Azul VM do in that case? (The `@Stable` annotation provides one possible partial answer.) Third-party libraries that implement mocking and serialization routinely peek and poke objects. Having an optimization depend on the *absence* of such libraries would seem to cause performance potholes. What would the Azul VM do if it were asked to mix in a third-party serialization package? Perhaps both of these concerns are more relevant to a reference implementation of Java, rather than a performance engine which users can either take or leave. (The above are non-rhetorical questions. I don't remember hearing answers to them at Azul's JVMLS presentation, but then again I didn't hear every detail. Would you please remind me where you have published the technical details?) Also, lest anyone misunderstand, I am not proposing the above JVM flag as a permanent feature nor a JVM specification change. It is a tool to help us get our arms around the problem of constant folding through objects.
23-07-2021

I thought we are discussing correctness concerns. In particular, if an object escapes before it is fully constructed, and is loaded from a thread running a static initializer, installing the object into a final field. Then we can’t trust transitive finals until that other thread finished constructing it. This is an issue without any use of unsafe or JNI. I propose to scan all threads to prove the final graph isn’t still being constructed by inspecting the stack traces. This shouldn’t need any spec changes and sounds pretty straight forward.
23-07-2021

I’d like to reiterate my note above about the untapped existing ability (within the current spec, and with no effective performance loss, including e.g. no change to Unsafe write performance) to optimize instance final fields. IMO the available optimization should be applied before seeking to add spec changes that would allow it to expand slightly in scope, and the value of such scope expansion should be weighed against the impacts of the spec changes. The available optimization’s scope limitations, within the current spec, can be stated as: - If an instance final field in a class has its offset exposed to user code (not to JVM internals) via e.g. Unsafe.objectFieldOffset, then that field would be treated as non-final, and optimizations based on field finality not would be available for that field, - If a single instance of a class that has an instance final field has that final field modified after initialization, via e.g. reflection or JNI, all instances of that class will lose the ability to optimize for the field being final. It is true that one can expand past this scope with spec changes as proposed. But in practice, we’ve found neither of the above scope limitations to be significant limitations, and plenty of existing Java code that successfully benefits from a Truly Final optimizations available within this scope. While I can’t really tell what the added benefit of the scope improvement would be above the already available optimization, it’s that delta that should be evaluated (rather than the delta from existing behavior that does not yet optimize for instance final fields) that should be considered as the benefit here. Furthermore, we’ve found that beyond Truly Final, the same optimizations can be applied to what we term Effectively Final (not-declared-as-final fields that, based e.g. on simple static analysis, can be speculated to behave like final fields, with the same enforcement and tracking mechanisms providing integrity for that speculation and related optimization) with additional value, where the proposed spec changes would not provide added optimizations to fields that are not declared as final.
23-07-2021

Rather than tracking the larval state explicity by setting some data in the markWord, an alternative solution is to use thread-local handshakes. Once <clinit> has finished running on an initializing thread, we know that objects constructed by the same thread are not larval any longer. However, we might have loaded references to objects concurrently constructed on different thread, and installed that into the final graph of said class. Explicit tracking of larval states allows detection of this. However, notably we can also detect the same thing with thread-local handshakes, without the need for explicitly tracking larval states. The JIT records what objects it assumes are not larval during its compilation, and fires off a thread-local handshake during the dependency verification phase, inspecting the stack traces of the other threads, to see that none of the recorded objects are in their corresponding constructors. If not, they are provably not larval objects any longer, and the final values recorded can be verified. If still the same, then you are good to go. If this turns out to be expensive to do for each compilation unit, there are plenty of opportunities to optimize this. For example a single scan per class to prove its final graph is purged from larval objects, would be enough to mark the class as non-larval, so subsequent compilations would not have to check again. Verifying the non-larvalness of classes can also be batched such that many classes are verified with the result of a single thread-local handshake rendezvous. Not needing to inject more mutable state into the markWord might make this easier to implement, not incur extra cost for each allocation, and would apply generally to all objects.
23-07-2021

I guess my point here is that the "have not been satisfied with any of them" conclusion is premature and may be leading to incorrect conclusions. We've shown that the optimization is straightforward (in the sense that it cleanly fits with current dependency tracking and de-optimization concepts), applicable en-mass, and comes with no loss of fidelity or performance. It is easy to demonstrate and micro-benchmark. Adding APIs and changing the specs to expose something that is straightforward for the JVM to optimize under current rules seems like the wrong way to go. To avoid that, we'd be happy to provide any pointers needed for moving past previously abandoned attempts .
11-12-2019

Thanks Gil. I also enjoyed this write-up: https://medium.com/azulsystems/truly-final-optimization-in-zing-vm-283d28418e55 The truth is that over the years we've tried several prototypes in the vein of Azul's static marking with dependency-based backoff. We have not been satisfied with any of them. Here's another recent attempt: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018342.html This RFE is proposing a more resilient solution, in which the whole suite of final-field optimizations can be permanently retained, even if a few instances of some class get mishandled via a deserialization, JNI, etc.
11-12-2019

See "Implementing Truly Final in Zing" (aka "Truly Final Fields with Anna Thomas") https://www.youtube.com/watch?v=2HfnaXND7-M for a full discussion. We have some experience actually implementing this, and would be happy to share what we've learned. I don't really see a need for the APIs and the larval/non-larval stuff here, as Unsafe interception can be easily done with no impact on modification speed.
19-11-2019

FTR Memory Access API (baking in Project Panama) stresses C2 final field “optimizations” (or absence of them?) a lot. So far, the focus (in our JITs) was on observing actual values (constants), but for Memory Access API it’s important to common final field accesses (across memory barriers and calls) to optimize away safety checks (liveness, bounds, and alignment checks).
09-11-2019

Considering final instance field locations are scattered across Java heap (final instance fields are bundled in an object instance along with other fields), Intel MPX doesn't look like a suitable option. Moreover, it doesn't look like hardware support is needed here. The problem can be efficiently solved purely on software level.
09-11-2019

On Intel processors, the JVM might benefit from MPX (memory protection extensions) to detect or prohibit final field modification once trusted as constant (in conjunction with the new option 'VerifyConstantFields' and a more relaxed black list perspective)?
09-11-2019

> The JVM should support an option `FoldConstantFields` which treats bypasses the above "white list" and uses a "black list" instead as needed. My reading of this is the current proposal is targeted for well-known classes (java.base). Is the proposal about replacing `trust_final_non_static_fields` (white list) with a more relaxed alternative (black list) for classes in java.base?
09-11-2019

Previous experiments on general problem of final field optimization: JDK-8132243, JDK-8058164. Relevant discussion: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018342.html
09-11-2019