Bug ID: JDK-8132243 Optimize Final Field Loads In Generated Code

JDK-8132243 : Optimize Final Field Loads In Generated Code

Type: JEP
Component: hotspot
Sub-Component: compiler

Priority: P3
Status: Draft
Resolution: Unresolved

Submitted: 2015-07-23
Updated: 2024-06-21

Related Reports

Relates :	JDK-8058164 - final fields in objects need to support inlining optimizations
Relates :	JDK-8233873 - final field values should be trusted as constant
Relates :	JDK-8224996 - Investigate memory access C2 optimizations
Relates :	JDK-8334754 - C2: Optimize accesses to provably final instance fields
Relates :	JDK-8235844 - Non-constant memory segments are never treated as loop invariants

Description

Summary
-------

Enable optimizations in JIT-compilers to constant fold final field loads in
generated code.

Goals
-----

The goal is to come up with a set of optimizations in JIT-compilers to
constant fold final field loads in generated code. During the course of work
opportunities for tightening rules for final field updates at run-time will also
be explored.

Motivation
----------

The JVMS and the JMM provide some strong guarantees about final field initialization
and visibility.

It's appealing from a performance perspective to exploit them and avoid loading
field values which don't change, thus producing more efficient code.

Moreover, optimizations on instance final fields are crucial for performance in
some scenarios. For example, JSR 292 (`java.lang.invoke`) heavily relies on
the ability to constant fold loads from final instance fields to get decent
`invokedynamic` performance (there are special cases in the JVM code for now).

Although the `HotSpot` JVM already optimizes loads from static final fields, it is
still very conservative when seeing instance final fields. The reason is that
there are scenarios (e.g. deserialization) when the object constructor is skipped
and final field values are written after the object is instantiated.

Immutable objects are promoted as safe in concurrent scenarios and are becoming very popular, so many applications should benefit from such optimizations.

Description
-----------

The JVMS is already quite restrictive. At the byte-code level, instance final field
writes are allowed only in constructors (`<init>`) and static final field
writes only in static initializers (`<clinit>`).

However, there's a limited set of additional scenarios when final field
updates are possible. There are 4 ways to circumvent the limitations and
change a final field value at run-time:

- Reflection API (through `Field.setAccessible()`)

- `java.lang.invoke` (through `Lookup.unreflect()` since there's no way to get a setter for a final field)

- `JNI` (`SetXXXField()`)

- `sun.misc.Unsafe` (`setXXX()`/`setXXXUnaligned()`)

The `Unsafe` API is deliberately left out of scope. It is designed as a simple,
well-factored set of building blocks to implement low-level JVM operations and
(independently) provide access to some run-time features of the hardware
platform. It is a user's responsibility to ensure that performed operations are
safe.

Regarding all other cases, JIT-compilers should take all of them into account
when optimizing final field loads and either track updates or be conservative
and avoid optimizations.

There are 3 approaches being considered:

1. tighten run-time rules for final field updates: forbid all stores to final
fields once the object is fully constructed;

2. silently `nullify` (ignore and discard) illegal stores to final fields;

3. track all final field updates in the JVM and adapt accordingly.

The first approach, tightened rules for illegal final field updates, requires the JVM
to throw an exception when a store to final field is performed on a properly constructed
object (fail-fast approach). It aligns run-time behavior with the JVMS.

The normal `new`/`<init>` byte-code sequence guarantees that the object is
properly constructed once constructor has completed.

It liberates the JVM from the responsibility to track all final field updates
and throwing away generated code when a field which was optimized earlier
changes its value.

However, there are valid use-cases when JVMS restrictions should be relaxed
(e.g., deserialization). The common scenario is separate object construction
and publication. In such case, the `new`/`<init>` sequence doesn't work anymore and
non-standard ways to instantiate objects are used. There are 3 ways to create
an instance without running a constructor on it:

1. `Unsafe.allocateInstance(Class<?>)`

2. `ReflectionFactory.newConstructorForSerialization(Class<?>, Constructor<?>)` (used by deserialization)

3. `AllocObject(JNIEnv*, jclazz)` in JNI

These functions should produce "slushy" objects - objects which can freely
change after they are instantiated. The JVM should allow final field updates
for such objects and be conservative when optimizing for them.

The "slushiness" property can be recorded as a flag in the object header.

Since it is the user's responsibility to either invoke a constructor or manually
initialize the object, an additional operation ("publish"/"freeze") is needed to
signal that construction is over and clear the "slushy" flag. It lets the JVM
know that the object construction is finished (no more final field updates are
planned), so the JVM can harden checks and optimize operations on final fields
from then on.

JIT-compilers consult that flag to gate final-folding. Reflection, JNI, and
MethodHandles check the flag when attempting to write to a final field and
throw a error if it is not set.

The second approach, silently `nullifying` stores to final fields in properly constructed
objects, is legal in some cases according to the JMM. Nullification is indistinguishable
from the store occurring but never being observed by a future read. This is
possible if either the store is delayed indefinitely, or if all threads (and
compiled methods) have previously performed a caching read of the original
final value. Additional investigation should be conducted to ensure that the JMM
allows some sort of OOTA caching read of the original final value, since the
threads aren't obliged to physically do such a caching read first.

Finally for the third approach, if there are no adjustments to run-time behavior, the JVM
has to track all final field updates and adapt accordingly by invalidating all affected
generated code.

JNI, `java.lang.invoke` and the Reflection API should be instrumented with
additional checks to notify the JVM when an application attempts to write to a
final field. The JVM should track all the dependencies in generated code on
final field values.

Risks and Assumptions
---------------------

There are compatibility risks due to hardened checks in the Reflection API.
If an application uses the Reflection API to write final fields, it will get
run-time errors when attempting to perform such operations.

External users of `sun.misc.Unsafe` are affected if they change final fields in a
properly constructed object. Such updates aren't guaranteed to be visible, i.e. just as today if static final fields are updated.

There's a risk that a user forgets to perform the "publish" operation and the
object stays in "slushy" state forever.

That can be mitigated by providing JVM and library diagnostic functionality to
detect runaway slushy objects:

- The JVM can be equipped with elaborate checks to hunt them down;

- an alternative `sun.misc.Unsafe` implementation which detects final field updates and
performs a "slushy" bit check can be implemented.

For a JVM-only optimization, experiments showed a considerable increase in recorded
dependencies for generated code during run-time. It stresses dependency
tracking machinery in the JVM, both in recording (more space needed) and
checking (more work to enumerate affected generated code).

The impact should be measured and additional optimizations considered (e.g.
more efficient lookup of per-object dependencies, per-class vs per-object
dependency tracking) to reduce both the number of dependencies and dependency
tracking overhead.

Dependencies
-----------

None.

Comments

Doug Lea commented on interactions between final field optimisations and JMM: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/035829.html > ## Side note on races > > Although race conditions (on non-volatile fields) allow the JVM some > latitute to return "stale" values for field references, such latitude > would usually be quite narrow, since an execution of the invalid > optimized method is likely to occur downstream of the invalidating > field update (as determined by the happens-before relation of the > JMM). Ever since initial revisions of JLS1 version, the intent of JMM specs (including current) is to allow compilers to believe that the value they see in initial reads of a final field is the only value they will ever see. So no revision is necessary on these grounds (although one of these days there will be one that accommodates VarHandle modes etc, formalizing http://gee.cs.oswego.edu/dl/html/j9mm.html). Some of the spec messiness exists just to explain why compilers are allowed not to believe this as well, because of reflection etc. In other words, don't let JMM concerns stop you from this worthwhile effort.
13-11-2019
Having an explicit "slushy bit" in the user model makes it easier for programmers to trust that final fields will be properly inlined. The dependency-tracking trick, while possible in theory, can fail badly in practice, because one bad putfield can deoptimize a large amount of code. I can paint my 12-foot cathedral ceiling standing on a 6-foot ladder, until I make a wrong step and fall off; then I wish I had brought in a scaffold. Using invisible dependencies asks our customers to use the ladder instead of the framework. The slushy bit is an example of "type-state", where an object's (dynamic) type can vary over time. In this case it can only vary a little bit. (All bits are little, actually.) One good implementation of this could be a bit pattern in the object header. The header bits which manage synchronization could be given a special pattern which means "I am slushy, and by the way you can't synchronize on me yet". The first type-state of an object's life cycle must be slushy, if the object has any final fields that need to be initialized. Call the next state "normal". The object is switched from slushy to normal when the constructor exits. This is the same point at which the JMM mandates that finals be "frozen". For objects created by a constructor, there is no need to explicitly change the header bit pattern; it can be initialized to the normal state. As an edge case, if the constructor synchronizes on 'this', or allows 'this' to escape somehow, perhaps we want to dictate that the object escapes with its slushy bit turned on. Alternatively, for best backward compatibility, the "slushy" bit is only turned on by those special methods (Unsafe.allocateInstance) which create objects for deserialization. There are interesting type-states that come after normal. A "frozen" state would mean that all fields (not just final fields) are immutable and cannot be changed. Allowing this state (for some class C) would mean that writes to mutable fields would have to be checked by the interpreter and JIT (at least for class C). This could have runtime costs, but would allow users to safely create immutable data structures outside of constructor code. For example, the builder pattern (for an immutable type) requires a buffer object to accumulate object state, with the terminal "build" operation creating the final immutable copy all in one go. A more efficient and concise version of this pattern would accumulate the effects of all builder commands into a larval, mutable object held by the builder, which would then be frozen just before the builder releases the object to the caller of the "build" method. (Value types will provide additional ways to formulate this pattern, if the final result is a pure value.) Frozen objects also provide safe sharing without synchronization. A frozen object is a natural component of a message passed between concurrent actors. (Mutable objects, such as arrays or lists or maps, are currently used with a "mental handshake" by the coder, who promises not to write code with race conditions, but where the race conditions are not provably excludable. Frozen objects provably exclude race conditions, so can be optimized better.) A fourth type-state (or maybe just a variation of "frozen") is the "value-based" type state, where a heap-allocated object declares that identity-sensitive operations are no longer relevant. In such a case, we can think about allowing the JVM to reject and diagnose such operations. This condition is described here: https://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html Besides field mutations (excluded by the frozen state), identity-sensitive operations are synchronization ("monitorenter"), wait/notify, identityHashCode, and the popular ref==ref operator ("acmp"). Synchronization can be disallowed with a RuntimeException, while equals and hashCode are more difficult to cope with; they can be interpreted structurally (as aliases for the Object methods), or perhaps disallowed, with users being required to amend their code in some cases. An object which completely encapsulates its identity may be called "identity-cryptic" or "identity-hostile". Such an object permits additional optimizations, notably structure de-duplication (a Holy Grail in the string compaction quest), and structure replication (a possible aid to scaling in very large memories). It is likely that boxed value types will support such optimizations, which is good. But the optimizations are equally useful with object types which (unlike value type boxes) have clear "larval" or "mutable adult" phases, notably Java arrays and lists. Completely frozen, non-synchronizable objects are also candidates for shared heaps. If an AOT compiler can predict the results of calling various "<clinit>" methods, and pre-construct a heap image for use by the JVM, all the nodes in this heap image should (probably) be made at least frozen and preferably identity-cryptic, so that there will be no occasion for the application JVM to dirty the shared virtual memory pages on which the nodes live. There are of course other problems to solve (such as pointer relocation) before this is a reality, but nailing down mutations and identity effects is an important step along the way.
25-10-2016
For background on the life cycle phases of "slushy" and "frozen", see: https://blogs.oracle.com/jrose/entry/larval_objects_in_the_vm The blog entry uses the terms "larval" and "adult" to describe the same phases.
18-12-2015
Tracking anomalous "slushy" vs. normal object states (outside of constructors) is probably necessary, for objects not created by normal constructor execution. As we refine the JMM for finals, we should also consider adding a set-once restriction to final fields. That is, even if a final field is not yet frozen (because the constructor has not exited, or the object is the the slushy state), setting it should freeze it. This is a very reasonable restriction, ensuring that observers of final fields will either see the initial default value, or else a unique final, frozen value. (As a degenerate case, the final value might be identical to the default.) If we do this, then it will be very easy to detect when a final field has been frozen, most of the time: It will be frozen if it contains a non-default value. This gives us @Stable semantics for final fields, in cases where (for whatever reason) the freezing "putfield" can be delayed (e.g., to record lazy evaluations). The main objections to using this trick: (1) The semantics are irregular when the field is frozen to its default value, and (2) there is no recommendable way for normal applications to use the trick. Objection (1) is hard to address directly without adding extra costs to final fields. As a matter of practicality, the benefit from distinguishing frozen non-default values (as frozen) seems large enough to outweigh the irregularity of not distinguishing frozen default values, since code can usually be adjusted to ensure that final fields (which the programmer wants to optimize) will never freeze to their defaults. I.e., a powerful optimization with an irregularity can be better than no optimization at all. I suggest that we accept the optimization with its practical irregularity. Objection (2) can be addressed by enhancing the Java field model to include "lazy finals", as a language feature. See: http://cr.openjdk.java.net/~jrose/draft/lazy-final.html The asymmetry with default values can be addressed with value types, by using a wrapper (like Optional) to hold a possibly-zero or possibly-null value. The wrapper's default would be "no value yet", and it would be initialized to "wrapping value (X)" where X could be a default value. The wrapper would require at least one more bit than the base type of X, or else some other convention (such as a bit pattern known to the JVM but illegal for representing any "normal" value).
28-10-2015