JDK-8261007 : Frozen Arrays (Preview)
  • Type: JEP
  • Component: not defined
  • Priority: P3
  • Status: Draft
  • Resolution: Unresolved
  • Submitted: 2021-02-03
  • Updated: 2021-02-08
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Description
# Summary

Introduce a new variation within the built-in Java array types, which
is unmodifiable (shallowly immutable).

Frozen arrays can be safely shared without coordination or risk of
unexpected modification.  Freezing is a more efficient alternative to
defensive copying, in that the copy can frequently be optimized away
by the runtime.

This is a [preview language and VM feature](http://openjdk.java.net/jeps/12).



# Goals

This JEP proposes minor changes to the Java Programming Language and
Java Virtual Machine, which currently provide that all Java arrays are
mutable.  Users will be able to *freeze* any input array, yielding a
so-called "frozen array" whose type, length, and contents are
identical with the input array, but whose elements can never be
reassigned.  Frozen arrays will support certain optimizations and
safety techniques (such as constant folding and defensive copying)
which ordinary mutable arrays cannot support.


# Non-Goals

None of the following plausible extensions to support immutable arrays
are in scope for this JEP, though they could possibly be the subject
of future work:

 - Language support for directly declaring variables which are frozen
   arrays, for array creation expressions that directly create frozen
   arrays, or for variable arity methods which receive their arguments
   as frozen arrays.  (But see notes under "Future Work".)

 - An identity-free variation of frozen arrays whose instances are not
   only arrays but in fact primitive objects.

 - While allowing identity to frozen arrays, we could borrow the idea
   of a non-synchronizable references from primitive objects.  (This
   would seem to be a well-intentioned irregularity of the sort that
   just buys trouble.)

 - Adding more (non-static) utility methods Java arrays.

 - The ability to forcibly "recycle" a user-created mutable array as a
   frozen array by telling it to "freeze itself".  Although the JVM
   can sometimes do this under the covers, calling "freeze" on a
   mutable array will always return a different array.

 - Transactional and/or confinement and/or slicing array operations,
   which allow one or more element positions to change between mutable
   and frozen status, or which provide restricted views on larger
   and/or mutable and/or larval and/or globally shared arrays.

 - Retrofitting existing APIs (such as Core Reflection) to make use
   of frozen arrays where they presently use mutable arrays.
   (But see notes under "Future Work".)

 - Direct support for deeply nested immutable array-based data structures.

 - Direct support in the class file format for creating and initializing
   frozen arrays.  (E.g., mechanisms which condy or indy could use to
   pull "read only data" directly from the class file, as with `.rodata`
   segments in other systems.)

 - Adding analogous support to `List` types (beyond `List.of`, etc.),
   such as support for list-based initialized declarations, constant
   or template expressions, or varargs.

 - New types or implementations of arrays, or virtualized
   representations of Java arrays, sometimes loosely designated as
   "Arrays 2.0".

 - Freezing operations for Java types other than arrays.

 - Changes to arrays to make them appear more "value-based" or
   identity hostile, such as list-like `equals`, `hashCode`, and
   `toString` methods.

Other followup efforts may enhance existing APIs to take advantage of
frozen arrays, or introduce new language features and APIs built on
top of frozen arrays.


# Motivation

Java programmers enjoy the use of compact and intuitive notations for
working with Java arrays, as defined by the Java Language
Specification.  (We may call these "built-in Java arrays" when other
kinds of arrays might be understood, such as off-heap arrays, or
object-based array types such has some libraries supply.)  Built-in
arrays can be read and written using the familiar bracket notation
`a[i]`.  Because of their built-in status, Java arrays often offer
better performance, in footprint and speed, than other kinds of linear
collections (such as `ArrayList` or `ByteBuffer`).

Because Java arrays, as originally specified, are always modifiable,
they cannot always be used safely.  For example:

 - A method that returns multiple results in an array cannot return
   them in an array which has been cached, since one client may
   "clobber" the array making it unfit for use by future clients.

 - A method that _receives_ multiple arguments using an array (say,
   via the "varargs" feature), cannot cache that array, because the
   caller may recycle the argument array for a future call with
   different arguments.

 - A shared array variable (such as a global `static` constant or an
   array stored in a shared object) cannot in general be publicly
   exposed for sharing, because a malfunctioning client might clobber
   an element of the array, creating a race condition, and making
   other threads see an unpredictable mix of the old and new element
   values.

 - The workaround for the three previous problems is returning or
   making a so-called "defensive copy" as needed.  But defensive
   copies are easy to omit accidentally, and their inherent cost
   tempts programmers to "optimize them away", sometimes erroneously.

 - APIs which are structured as multiple layers or encapsulations will
   require in general a defensive copy every time an array crosses
   into or out of an abstraction boundary.  This burdens abstract,
   well-factored code with the unfair cost of deeply nested
   copy-chains of array data.

 - More subtly, a method receiving an array of arguments (whether
   "varargs" or not) cannot trust that those arguments to remain
   stable even _during its own execution_, since there may be side
   effects impinging on that array _concurrently from other threads_.
   This has been known to lead to so-called "TOCTTOU" bugs, where the
   method trusts arguments that have changed after they were checked.

 - Side effects to arrays are a potential source of side channel
   creation, perhaps usable to export information by malware.

All of these problems stem from the fact that the mutability of array
elements is a permanent feature in all arrays.  It can be addressed by
providing a way to create arrays whose elements are _not_ modifiable.

In effect programmers are faced with an unpleasant and un-Java-like
choice between reliability and performance, where the default action
(omitting the defensive copy) leads to unreliability which is often
difficult to diagnose.

The problems are exacerbated by the built-in status of Java arrays,
and also their age.  (Unlike `List`, built-in arrays have never _not_
been a part of Java.)  For those reasons, existing APIs, especially
older ones (such as Core Reflection), use array types even when their
use entails one or more of the above hazards.  The problem cannot be
shifted unless either the affected APIs are rewritten or until array
behavior changes.  This JEP does the latter.


# Description

The features described below are preview features, enabled with the
`--enable-preview` compile-time and runtime flags.

### **What's a frozen array?**

Given any array type `T[]` (including primitive-array types like
`int[]`), instances of that type may be in the _frozen_ state; such
instances are _frozen arrays_.  By contrast, before this JEP, all
arrays are not frozen; they may be referred to as "modifiable" (or
"mutable").  Any non-null reference of array type points to either a
modifiable or a frozen array.

Any attempt to write an element of a frozen array elicits an exception
of class `ArrayStoreException` (or a subclass).

Since the type system (of both language and VM) gives no way to
distinguish between variables which always refer to mutable arrays,
always refer to frozen arrays, or may refer to a mix of arrays, there
will in general be a dynamic check (for "frozen-ness") as part of any
store to an array (as `a[i]=x`, or bytecodes like `iastore` or
`aastore`).  This check is similar to the existing checks for null
references (potentially raising `NullPointerException`) or reference
array subtypes (potentially raising `ArrayStoreException`).

Like primitive wrapper types and strings, frozen arrays have identity
and may be synchronized upon.  As with the wrappers and strings,
programmers are encouraged to avert their gaze from these features.


### **Where do frozen arrays come from?**

Because a frozen array is a new configuration of data on the Java heap,
at least one special low-level JVM method is required in order to create
frozen arrays.

A separate JEP, [JDK-8261099][JDK-8261099], proposes a low-level
unsafe JVM method for locking down a previously mutable array "in
place", after which it is frozen.  Such a method must be used very
carefully, but it is possibly the original source of all frozen arrays
described in this JEP.  In the safe user model proposed by this JEP,
all factories of frozen arrays appear to create fresh arrays which
have been frozen from the beginning ("frozen from birth").

[JDK-8261099]: <https://bugs.openjdk.java.net/browse/JDK-8261099>

Since frozen arrays are "frozen from birth", it follows that their
factory methods must be passed _all elements_ to be stored in the new
frozen array.

Thus, one or more low-level static factory methods will be created
which allocate arrays in the frozen states.  The input to such a
method will be an array (either frozen or mutable) which contains the
elements to be stored in the new frozen array, and possibly starting
and ending indexes within the input array (cf. `Arrays.copyOfRange`).

As a straw-man example of the simplest possible API, a method
`System.arrayfreeze` could be created, by analogy with
`System.arraycopy`.  Instead of destination arguments, it would return
a frozen array which contains exactly the indicated source arguments:

```
T[] src = ...;
T[] dest = System.arrayfreeze(src, 0, src.length);
```

This method would internally use an unsafe primitive for freezing
an array in place, not available to general users.  The method
would look something like this:

```
Object arrayfreeze(Object src, int beg, int end) {
  Object dest = Arrays.newInstance(src.getClass().getComponentType(), end-beg);
  arraycopy(src, beg, dest, 0, dest.length);
  UNSAFE.freeze(dest);  // privileged and unsafe
  return dest;
}
```

**Please Note:** In this user model, freezing a mutable array will always return a
_different_ array.  A single array instance cannot transition between
the mutable and frozen conditions.  The exception to this rule is code
which uses a privileged low-level "freeze-in-place" method described in
[JDK-8261099][JDK-8261099].

A mutable array can always be created from a possibly-frozen array
by means of the `clone` operation.  That is, `clone` never returns
a frozen array, when applied to an array operand.

All these details may change.

We do not intend to add any new bytecode instructions.

### **How do I use a frozen array?**

Regardless of the low-level details of frozen array creation, a limited
number of utility methods will be created to encourage users to adopt
them for defensive copies.

 - For any array, the syntax `a.freeze()` will be accepted, with a
   meaning closely analogous to `a.clone()`.  (The only difference
   is in the mutability status of the respective return values.)

 - In the class `java.util.Arrays`, the methods `copyOf` and `copyOfRange`
   will be accompanied by analogous methods `freeze` and `freezeRange`.

 - The low-level static boolean method `System.isFrozenArray` (and/or
   `java.util.Arrays.isFrozen`) will test whether an array is frozen
   or modifiable.

 - A method handle factory `MethodHandles.frozenArrayConstructor`
   could be created to access factory methods, by analogy with
   `arrayConstructor`.  The returned method handle would not take
   an integer length, but rather a whole array of the contents to
   freeze.  If turned into a varargs method, it could provide
   access to hidden fast paths for particular lengths, and thus
   be useful with a translation strategy built on `invokedynamic`.

 - The method handle combinator variations of `asCollector` and
   `asVarargsCollector` might be adapted to perform freezing as well.

### Performance model of freezing and cloning

The specification of every frozen array factory will include the
provision that the JVM is always free to _recycle_ any previous frozen
array to produce the result.  That is, in response to a request for a
frozen array containing values `x[i]`, if the JVM can identify an
already existing frozen array `a` whose elements are the same (in the
same order) as the `x[i]`, then the request may be fulfilled with a
reference to `a`, rather than to a newly-created frozen array.

The recycling optimization is likely to occur (and may even be
guaranteed) in the following circumstances:

 - The length of the requested array is zero.  (A cached zero-length
   frozen array of the right element type can be used, perhaps found
   via a `ClassValue` table.)

 - The input array `a` to the factory method is already frozen, and
   the requested length is the same as the input array.  In this case,
   `a` can be returned directly (an O(1) operation).  This may be
   described as the idempotency property of freezing.

 - Thus, a copy chain of the form `a.freeze().freeze()` may be
   evaluated as if it had been `a.freeze()`.

 - In a copy chain of the form `a.clone().freeze()`, if the VM can
   prove that the intermediate operand (to `freeze`) is not used
   except in the freezing request, the copy chain may be evaluated as
   if it had been `a.freeze()`.

 - A locally confined (non-escaping) array of the form `a.clone()`,
   if never modified, can be treated as if created by `a.freeze()`
   (and, if `a` is already frozen, just `a`).  Thus, end-users of
   defensively-copied arrays might see their code improve in
   performance if such transforms are done "under the covers".

 - A locally confined (non-escaping) array of the form `new a[N]`, if
   modified only before a subsequent `freeze` operation, can be
   treated as if created by an `arrayfreeze` operation used on a
   scratch buffer; the scratch buffer could be reused later by the
   same thread.  Alternatively, if the JVM supplies an N-ary factory
   method of the right type, that could be called directly.
   Thus, end-users of newly created frozen arrays their code perform
   as if only the final frozen array is created, not the scratch
   array.  This sort of optimization will be easier to do if we
   add direct notation for creating initialized frozen arrays,
   and these are translated using `invokedynamic`, which has
   a good chance of finding the best factory method in a given
   runtime.

The JIT may be able to perform additional optimizations of this sort
after inlining.  In both the interpreter and JIT, a long chain of
defensive copies can be collapsed to less expensive operation.  The
user model can be summarized as:

 - store and pass your arrays in frozen form, when possible

 - let someone else do the cloning, if it must happen

Trusted code which is performance sensitive may use a restricted
unsafe primitive method (of [JDK-8261099][JDK-8261099]) to create new
arrays and freeze them in place without the extra copying step.  Using
the unsafe primitive should not be done unless there is a special
reason the JIT cannot optimize the code using the more generic
techniques outlined in this JEP.  Users of the unsafe primitive
outside of `java.base` are likely to have their code break when the
internal implementation of frozen arrays changes.

### Language changes and translation strategy

The exceptions throwable from an assignment expression, when the
target is an array element, must be amended to include the possibility
of an `ArrayStoreException` when the destination array is frozen.

(Note that this is a fundamentally new dynamic check and exception for
primitive arrays and `Object` arrays, which previously could not raise
an `ArrayStoreException` when written to.)

The syntax `a.freeze()` requires special permission from the Java
Language Specification, analogous to `clone`.

It seems most undesirable to hardwire `freeze` into the Java VM in the
same way `clone` was hardwired, using ad hoc special pleading to give
`clone` a special access mode and verifier type.  Therefore, we will
consider injecting the `freeze` method using a more scalable technique
that does not affect the VM specification.  For example, `a.freeze()`
may be defined as "sugar" for a static method call, such as
`java.lang.ArrayMethods.freeze(a)`.  This technique unlocks follow-on
opportunities to connect to additional methods of `java.util.Arrays`,
without changing either the languauge or the VM.

### Serialization

Serialization will be enhanced to transmit the frozen status of
arrays.  It seems sad to do this, but it can be done fairly simply on
top of the existing code, and doing so would seem to be a way to
forestall serialization-related security bugs that could arise if data
structurues which are relying on frozen arrays suddenly became
internally mutable in the elements of those arrays, due to a
deserialization trick.  The rejoinder to this point is that
properly written `readObject` methods already do a `clone` and
would be changed to call `freeze` instead.  The rejoinder to
this rejoinder is that it will be less surprising to end users
if passing an array through a serialization/deserialization
cycle preserves every bit of that array's structure, not
just the type, length, and elements, but also the frozen bit.

### JNI and Unsafe

The JNI operations the allow mutation of arrays should at least
warn if they are applied to frozen arrays.  (Perhaps it could
be a warning plus a no-op.)  Code which uses Unsafe to poke
new values into arrays will need to check the `isFrozen` bit,
in general.


### No type system effects

No type system, either that of the language, the method and field
descriptors, the bytecode verifier, the dynamically checkable types
(via `instanceof`), or the reflective `Class` system, will make a
distinction between frozen and mutable arrays.  Only the dynamic
`isFrozen` check will be available to apply to instances.

In particular, the system will not enforce deep immutability of
nested arrays.  (It may _create_ and _observe_ such structures,
and arrange its optimizations accordingly, as long as this is
done invisibly to users.)

### Java memory model

The JMM may be updated to state that the elements of a frozen array
have the same effects on safe publication as `final` fields.  In
effect a frozen array acts like an all-`final` object, whose fields
are referenced by index (`aaload`, `iaload`, etc.) rather than by name
(`getfield`).

The JMM has a "freeze" operation for objects with final fields, which
occurs at the end of the constructor invocation; this "freeze"
operation will also apply to the creation of a frozen array.

(The privileged low-level "freeze-in-place" method is also a "freeze"
operation with respect to the JMM; this internal method does not need
to be mentioned in the JMM, unless we figure out a way to make it
safely exposed in the public user model.)


### Example

```
class Node {
  private final String label;
  private final Node[] children;  //always frozen
  public Node[] children() { return children; }
  public String label() { return label; }
  public Node(String l, Node... cn) {
    cn = cn.freeze(); // O(1) if already frozen
    label = l;
    children = cn;
  }
}
class NodeUser {
  Node defPair(Node a, Node b) {
    Node[] ab = { a, b };
    ab = ab.freeze();  // JIT might optimize this locally
    return new Node("pair", ab);
  }
  void use(Node node) {
    Node[] frozenNodes = node.children();
    Node[] mutableNodes frozenNodes.clone();
    Arrays.sort(mutableNodes);
    doStuff(mutableNodes);
    doStuff(frozenNodes);  // still safe to use these
  }
}
```


# Alternatives

Unmodifiable lists created by `List.of` are a recent workaround for a
parallel set of problems with mutable lists.  Programmers are
encouraged to use such lists when possible in preference to built-in
Java arrays.  However, this is not always possible, since an existing
API may require the use of built-in arrays, or programmers may require
the speed and footprint characteristics of built-in arrays, or they
may simply stubbornly prefer the built-in Java expression notation for
working with built-in arrays.

We could wait for a larger virtualization effort ("Arrays 2.0") at
which point adding frozen arrays would be as simple as adding
`List.of` was.  (Which wasn't actually as simple as one might have
thought.)  But that would surely put off for many years the specific
benefits of frozen arrays _in today's language and APIs_.

We could wait for primitive classes and then cross-apply the concepts
to built an enhanced "frozen identity-free" array, which acts like a
primitive object.  However, that is not a universally applicable patch
to the problems which identity-laden frozen arrays address in the
present JEP.  In particular, the performance model of "frozen
identity-free" arrays would presumably suffer from the surprising
requirement that the `==` operator (and the `acmp` instructions) must
perform O(N) work instead of O(1) work in order to prove that two
frozen arrays are distinct.

We could try to build in some sort of "lock" operation which converts
part or all of an array _in place_ to a frozen status, either
permanently or until a corresponding "unlock" operation, or build some
sort of "immutable view" of a mutable backing array, like the
Collection APIs.  Such mechanisms are more complex and costly than the
present proposal, and are thus harder to use correctly, as well as
being less appealing as an alternative to defensive copying.  Allowing
an array to change its mutability state _creates a new kind of side
channel_ with new concurrency failure modes, and so seems to be a poor
solution to many of the problems mentioned above.


# Risks and Assumptions

The extra checks require some careful optimizer work, analogous to
null pointer removal.  These might cause performance problems, though
not relative to plain `clone`, of course.

If the user model is not simple enough, or there are unpleasant
surprises, or the performance is not good enough, or if retrofitted
APIs are hard to roll out, adoption may be weak.

Retrofitting array-returning APIs to return frozen arrays will be a
delicate operation, because of use cases in the wild (perhaps 0.0001%
of them) where somebody has decided it would be clever to mutate a
returned array.

This JEP assumes we can afford to add (as a preview) a handful of
factories and other API points for arrays, notably the `a.freeze()`
notation, as discussed above.  Alternatively, we could stage all of
these API points as static methods on `jdk.internal.FrozenArrays`,
for use internally within `java.base`, and then roll them out as
user-visible API points in a separate step.



# Dependencies and Future Work

There are no dependencies on existing work.

Some engineering of array store checks for Project Valhalla may
cross-apply to the mutability store checks required by this JEP.

If we decide to forbid synchronization on frozen arrays, some
work for primitive classes (Valhalla again) can cross-apply.

It seems very desirable to add a way for varargs methods to request
frozen input arrays.  This would have knock-on benefits for both
footprint and security.  Inside the method, there would be no need for
annotations of the form `@SafeVarargs`.  Outside the method, clients
could preferentially pass frozen arrays instead of mutable ones.

If we do frozen varargs, we should consider using a related syntax (a
contextual keyword?) to declare initialized arrays and for array
creation expressions:

```
int sumInts(__Frozen int... ints) {
  // bytecode does ints = ints.freeze()
  int sum = 0;
  for (int i : ints)  sum += i;
  return sum;
}
void foo() {
  sum();  // passes frozen zero-length array?
  sum(1,2,3);  // passes frozen length-3 array?
}
void bar() {
  __Frozen int[] stuff = { 1,2,3 };
  assert stuff.isFrozen();
  sum(stuff);
  sum(new __Frozen int[]{ 4,5,6 });
}
```

We will want to retrofit many APIs in `java.base`.  Examples of API
points which could profit from frozen arrays:

 - reflective array-returning methods in `Class`, `jlr.Executable`, etc.
 - `Enum.values`
 - `Throwable.getStackTrace`
 - `MethodType.parameterArray`, `DynamicCallSiteDesc.bootstrapArgs`, etc.
 - methods that return byte or char arrays as a "result" or "payload"
 - `Collection.toArray` method and various related
 - methods which accept array arguments, such as `Runtime.exec`

In many of these cases, it is likely that backward compatibility with
a few existing clients may force us to continue to return mutable
arrays.  This situation can sometimes be remedied by deprecating
the existing API point, in favor of a new one which returns the
"real" frozen array.  The existing API point can be documented as
equivalent to running `clone` on the result of the new API point,
For example:

```
public interface Collection<T> {
  /** Returns a frozen array containing all of the elements in this collection,
   * as if by {@code this.toArray().freeze()}.
   */
  default Object[] freezeArray() {
    return toArray().freeze();
  }
}
```

The naming of such new API points will be a delicate matter of
balancing conflicting concerns, to say nothing of bikeshed-painting.

See non-goals above for more possible follow-on features.

Comments
We���d like to add a `freeze` method alongside the `clone` method on array types. Like `clone`, `freeze` is public and covariant to the receiver type. This creates a problem for us: Although `clone` is just the same as the `Object.clone` method, it is ���rebranded��� with a new access mode and return type. This requires touches, just for the name `clone`, in both the JLS and JVMS. But I don���t want to just punch out the same kind of ad hoc special pleading for `freeze`. So I think we need a way to ���impute��� methods to arrays. JDK-8191530 is a version of extension methods, but a restricted version which is controlled by the class itself. They can���t be added unilaterally by a programmer who wants a fatter string or array API. This suggests an approach for injecting `freeze` and `isFrozen` into array types `T[]`. The approach would have other use cases also, such as adding more utility methods to arrays, and adding non-overridable methods to interfaces (sometimes useful for security or integrity). If the mechanism were applied to interfaces only, we could inject a third interface to all arrays (after `Cloneable` and `Serializable`) to carry `freeze` and the other extension methods. Perhaps we could even rationalize `clone` with a "ret-con".
06-02-2021