Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
# Summary Introduce a new variation within the built-in Java array types, which is unmodifiable (shallowly immutable). Frozen arrays can be safely shared without coordination or risk of unexpected modification. Freezing is a more efficient alternative to defensive copying, in that the copy can frequently be optimized away by the runtime. This is a [preview language and VM feature](http://openjdk.java.net/jeps/12). # Goals This JEP proposes minor changes to the Java Programming Language and Java Virtual Machine, which currently provide that all Java arrays are mutable. Users will be able to *freeze* any input array, yielding a so-called "frozen array" whose type, length, and contents are identical with the input array, but whose elements can never be reassigned. Frozen arrays will support certain optimizations and safety techniques (such as constant folding and defensive copying) which ordinary mutable arrays cannot support. # Non-Goals None of the following plausible extensions to support immutable arrays are in scope for this JEP, though they could possibly be the subject of future work: - Language support for directly declaring variables which are frozen arrays, for array creation expressions that directly create frozen arrays, or for variable arity methods which receive their arguments as frozen arrays. (But see notes under "Future Work".) - An identity-free variation of frozen arrays whose instances are not only arrays but in fact primitive objects. - While allowing identity to frozen arrays, we could borrow the idea of a non-synchronizable references from primitive objects. (This would seem to be a well-intentioned irregularity of the sort that just buys trouble.) - Adding more (non-static) utility methods Java arrays. - The ability to forcibly "recycle" a user-created mutable array as a frozen array by telling it to "freeze itself". Although the JVM can sometimes do this under the covers, calling "freeze" on a mutable array will always return a different array. - Transactional and/or confinement and/or slicing array operations, which allow one or more element positions to change between mutable and frozen status, or which provide restricted views on larger and/or mutable and/or larval and/or globally shared arrays. - Retrofitting existing APIs (such as Core Reflection) to make use of frozen arrays where they presently use mutable arrays. (But see notes under "Future Work".) - Direct support for deeply nested immutable array-based data structures. - Direct support in the class file format for creating and initializing frozen arrays. (E.g., mechanisms which condy or indy could use to pull "read only data" directly from the class file, as with `.rodata` segments in other systems.) - Adding analogous support to `List` types (beyond `List.of`, etc.), such as support for list-based initialized declarations, constant or template expressions, or varargs. - New types or implementations of arrays, or virtualized representations of Java arrays, sometimes loosely designated as "Arrays 2.0". - Freezing operations for Java types other than arrays. - Changes to arrays to make them appear more "value-based" or identity hostile, such as list-like `equals`, `hashCode`, and `toString` methods. Other followup efforts may enhance existing APIs to take advantage of frozen arrays, or introduce new language features and APIs built on top of frozen arrays. # Motivation Java programmers enjoy the use of compact and intuitive notations for working with Java arrays, as defined by the Java Language Specification. (We may call these "built-in Java arrays" when other kinds of arrays might be understood, such as off-heap arrays, or object-based array types such has some libraries supply.) Built-in arrays can be read and written using the familiar bracket notation `a[i]`. Because of their built-in status, Java arrays often offer better performance, in footprint and speed, than other kinds of linear collections (such as `ArrayList` or `ByteBuffer`). Because Java arrays, as originally specified, are always modifiable, they cannot always be used safely. For example: - A method that returns multiple results in an array cannot return them in an array which has been cached, since one client may "clobber" the array making it unfit for use by future clients. - A method that _receives_ multiple arguments using an array (say, via the "varargs" feature), cannot cache that array, because the caller may recycle the argument array for a future call with different arguments. - A shared array variable (such as a global `static` constant or an array stored in a shared object) cannot in general be publicly exposed for sharing, because a malfunctioning client might clobber an element of the array, creating a race condition, and making other threads see an unpredictable mix of the old and new element values. - The workaround for the three previous problems is returning or making a so-called "defensive copy" as needed. But defensive copies are easy to omit accidentally, and their inherent cost tempts programmers to "optimize them away", sometimes erroneously. - APIs which are structured as multiple layers or encapsulations will require in general a defensive copy every time an array crosses into or out of an abstraction boundary. This burdens abstract, well-factored code with the unfair cost of deeply nested copy-chains of array data. - More subtly, a method receiving an array of arguments (whether "varargs" or not) cannot trust that those arguments to remain stable even _during its own execution_, since there may be side effects impinging on that array _concurrently from other threads_. This has been known to lead to so-called "TOCTTOU" bugs, where the method trusts arguments that have changed after they were checked. - Side effects to arrays are a potential source of side channel creation, perhaps usable to export information by malware. All of these problems stem from the fact that the mutability of array elements is a permanent feature in all arrays. It can be addressed by providing a way to create arrays whose elements are _not_ modifiable. In effect programmers are faced with an unpleasant and un-Java-like choice between reliability and performance, where the default action (omitting the defensive copy) leads to unreliability which is often difficult to diagnose. The problems are exacerbated by the built-in status of Java arrays, and also their age. (Unlike `List`, built-in arrays have never _not_ been a part of Java.) For those reasons, existing APIs, especially older ones (such as Core Reflection), use array types even when their use entails one or more of the above hazards. The problem cannot be shifted unless either the affected APIs are rewritten or until array behavior changes. This JEP does the latter. # Description The features described below are preview features, enabled with the `--enable-preview` compile-time and runtime flags. ### **What's a frozen array?** Given any array type `T[]` (including primitive-array types like `int[]`), instances of that type may be in the _frozen_ state; such instances are _frozen arrays_. By contrast, before this JEP, all arrays are not frozen; they may be referred to as "modifiable" (or "mutable"). Any non-null reference of array type points to either a modifiable or a frozen array. Any attempt to write an element of a frozen array elicits an exception of class `ArrayStoreException` (or a subclass). Since the type system (of both language and VM) gives no way to distinguish between variables which always refer to mutable arrays, always refer to frozen arrays, or may refer to a mix of arrays, there will in general be a dynamic check (for "frozen-ness") as part of any store to an array (as `a[i]=x`, or bytecodes like `iastore` or `aastore`). This check is similar to the existing checks for null references (potentially raising `NullPointerException`) or reference array subtypes (potentially raising `ArrayStoreException`). Like primitive wrapper types and strings, frozen arrays have identity and may be synchronized upon. As with the wrappers and strings, programmers are encouraged to avert their gaze from these features. ### **Where do frozen arrays come from?** Because a frozen array is a new configuration of data on the Java heap, at least one special low-level JVM method is required in order to create frozen arrays. A separate JEP, [JDK-8261099][JDK-8261099], proposes a low-level unsafe JVM method for locking down a previously mutable array "in place", after which it is frozen. Such a method must be used very carefully, but it is possibly the original source of all frozen arrays described in this JEP. In the safe user model proposed by this JEP, all factories of frozen arrays appear to create fresh arrays which have been frozen from the beginning ("frozen from birth"). [JDK-8261099]: <https://bugs.openjdk.java.net/browse/JDK-8261099> Since frozen arrays are "frozen from birth", it follows that their factory methods must be passed _all elements_ to be stored in the new frozen array. Thus, one or more low-level static factory methods will be created which allocate arrays in the frozen states. The input to such a method will be an array (either frozen or mutable) which contains the elements to be stored in the new frozen array, and possibly starting and ending indexes within the input array (cf. `Arrays.copyOfRange`). As a straw-man example of the simplest possible API, a method `System.arrayfreeze` could be created, by analogy with `System.arraycopy`. Instead of destination arguments, it would return a frozen array which contains exactly the indicated source arguments: ``` T[] src = ...; T[] dest = System.arrayfreeze(src, 0, src.length); ``` This method would internally use an unsafe primitive for freezing an array in place, not available to general users. The method would look something like this: ``` Object arrayfreeze(Object src, int beg, int end) { Object dest = Arrays.newInstance(src.getClass().getComponentType(), end-beg); arraycopy(src, beg, dest, 0, dest.length); UNSAFE.freeze(dest); // privileged and unsafe return dest; } ``` **Please Note:** In this user model, freezing a mutable array will always return a _different_ array. A single array instance cannot transition between the mutable and frozen conditions. The exception to this rule is code which uses a privileged low-level "freeze-in-place" method described in [JDK-8261099][JDK-8261099]. A mutable array can always be created from a possibly-frozen array by means of the `clone` operation. That is, `clone` never returns a frozen array, when applied to an array operand. All these details may change. We do not intend to add any new bytecode instructions. ### **How do I use a frozen array?** Regardless of the low-level details of frozen array creation, a limited number of utility methods will be created to encourage users to adopt them for defensive copies. - For any array, the syntax `a.freeze()` will be accepted, with a meaning closely analogous to `a.clone()`. (The only difference is in the mutability status of the respective return values.) - In the class `java.util.Arrays`, the methods `copyOf` and `copyOfRange` will be accompanied by analogous methods `freeze` and `freezeRange`. - The low-level static boolean method `System.isFrozenArray` (and/or `java.util.Arrays.isFrozen`) will test whether an array is frozen or modifiable. - A method handle factory `MethodHandles.frozenArrayConstructor` could be created to access factory methods, by analogy with `arrayConstructor`. The returned method handle would not take an integer length, but rather a whole array of the contents to freeze. If turned into a varargs method, it could provide access to hidden fast paths for particular lengths, and thus be useful with a translation strategy built on `invokedynamic`. - The method handle combinator variations of `asCollector` and `asVarargsCollector` might be adapted to perform freezing as well. ### Performance model of freezing and cloning The specification of every frozen array factory will include the provision that the JVM is always free to _recycle_ any previous frozen array to produce the result. That is, in response to a request for a frozen array containing values `x[i]`, if the JVM can identify an already existing frozen array `a` whose elements are the same (in the same order) as the `x[i]`, then the request may be fulfilled with a reference to `a`, rather than to a newly-created frozen array. The recycling optimization is likely to occur (and may even be guaranteed) in the following circumstances: - The length of the requested array is zero. (A cached zero-length frozen array of the right element type can be used, perhaps found via a `ClassValue` table.) - The input array `a` to the factory method is already frozen, and the requested length is the same as the input array. In this case, `a` can be returned directly (an O(1) operation). This may be described as the idempotency property of freezing. - Thus, a copy chain of the form `a.freeze().freeze()` may be evaluated as if it had been `a.freeze()`. - In a copy chain of the form `a.clone().freeze()`, if the VM can prove that the intermediate operand (to `freeze`) is not used except in the freezing request, the copy chain may be evaluated as if it had been `a.freeze()`. - A locally confined (non-escaping) array of the form `a.clone()`, if never modified, can be treated as if created by `a.freeze()` (and, if `a` is already frozen, just `a`). Thus, end-users of defensively-copied arrays might see their code improve in performance if such transforms are done "under the covers". - A locally confined (non-escaping) array of the form `new a[N]`, if modified only before a subsequent `freeze` operation, can be treated as if created by an `arrayfreeze` operation used on a scratch buffer; the scratch buffer could be reused later by the same thread. Alternatively, if the JVM supplies an N-ary factory method of the right type, that could be called directly. Thus, end-users of newly created frozen arrays their code perform as if only the final frozen array is created, not the scratch array. This sort of optimization will be easier to do if we add direct notation for creating initialized frozen arrays, and these are translated using `invokedynamic`, which has a good chance of finding the best factory method in a given runtime. The JIT may be able to perform additional optimizations of this sort after inlining. In both the interpreter and JIT, a long chain of defensive copies can be collapsed to less expensive operation. The user model can be summarized as: - store and pass your arrays in frozen form, when possible - let someone else do the cloning, if it must happen Trusted code which is performance sensitive may use a restricted unsafe primitive method (of [JDK-8261099][JDK-8261099]) to create new arrays and freeze them in place without the extra copying step. Using the unsafe primitive should not be done unless there is a special reason the JIT cannot optimize the code using the more generic techniques outlined in this JEP. Users of the unsafe primitive outside of `java.base` are likely to have their code break when the internal implementation of frozen arrays changes. ### Language changes and translation strategy The exceptions throwable from an assignment expression, when the target is an array element, must be amended to include the possibility of an `ArrayStoreException` when the destination array is frozen. (Note that this is a fundamentally new dynamic check and exception for primitive arrays and `Object` arrays, which previously could not raise an `ArrayStoreException` when written to.) The syntax `a.freeze()` requires special permission from the Java Language Specification, analogous to `clone`. It seems most undesirable to hardwire `freeze` into the Java VM in the same way `clone` was hardwired, using ad hoc special pleading to give `clone` a special access mode and verifier type. Therefore, we will consider injecting the `freeze` method using a more scalable technique that does not affect the VM specification. For example, `a.freeze()` may be defined as "sugar" for a static method call, such as `java.lang.ArrayMethods.freeze(a)`. This technique unlocks follow-on opportunities to connect to additional methods of `java.util.Arrays`, without changing either the languauge or the VM. ### Serialization Serialization will be enhanced to transmit the frozen status of arrays. It seems sad to do this, but it can be done fairly simply on top of the existing code, and doing so would seem to be a way to forestall serialization-related security bugs that could arise if data structurues which are relying on frozen arrays suddenly became internally mutable in the elements of those arrays, due to a deserialization trick. The rejoinder to this point is that properly written `readObject` methods already do a `clone` and would be changed to call `freeze` instead. The rejoinder to this rejoinder is that it will be less surprising to end users if passing an array through a serialization/deserialization cycle preserves every bit of that array's structure, not just the type, length, and elements, but also the frozen bit. ### JNI and Unsafe The JNI operations the allow mutation of arrays should at least warn if they are applied to frozen arrays. (Perhaps it could be a warning plus a no-op.) Code which uses Unsafe to poke new values into arrays will need to check the `isFrozen` bit, in general. ### No type system effects No type system, either that of the language, the method and field descriptors, the bytecode verifier, the dynamically checkable types (via `instanceof`), or the reflective `Class` system, will make a distinction between frozen and mutable arrays. Only the dynamic `isFrozen` check will be available to apply to instances. In particular, the system will not enforce deep immutability of nested arrays. (It may _create_ and _observe_ such structures, and arrange its optimizations accordingly, as long as this is done invisibly to users.) ### Java memory model The JMM may be updated to state that the elements of a frozen array have the same effects on safe publication as `final` fields. In effect a frozen array acts like an all-`final` object, whose fields are referenced by index (`aaload`, `iaload`, etc.) rather than by name (`getfield`). The JMM has a "freeze" operation for objects with final fields, which occurs at the end of the constructor invocation; this "freeze" operation will also apply to the creation of a frozen array. (The privileged low-level "freeze-in-place" method is also a "freeze" operation with respect to the JMM; this internal method does not need to be mentioned in the JMM, unless we figure out a way to make it safely exposed in the public user model.) ### Example ``` class Node { private final String label; private final Node[] children; //always frozen public Node[] children() { return children; } public String label() { return label; } public Node(String l, Node... cn) { cn = cn.freeze(); // O(1) if already frozen label = l; children = cn; } } class NodeUser { Node defPair(Node a, Node b) { Node[] ab = { a, b }; ab = ab.freeze(); // JIT might optimize this locally return new Node("pair", ab); } void use(Node node) { Node[] frozenNodes = node.children(); Node[] mutableNodes frozenNodes.clone(); Arrays.sort(mutableNodes); doStuff(mutableNodes); doStuff(frozenNodes); // still safe to use these } } ``` # Alternatives Unmodifiable lists created by `List.of` are a recent workaround for a parallel set of problems with mutable lists. Programmers are encouraged to use such lists when possible in preference to built-in Java arrays. However, this is not always possible, since an existing API may require the use of built-in arrays, or programmers may require the speed and footprint characteristics of built-in arrays, or they may simply stubbornly prefer the built-in Java expression notation for working with built-in arrays. We could wait for a larger virtualization effort ("Arrays 2.0") at which point adding frozen arrays would be as simple as adding `List.of` was. (Which wasn't actually as simple as one might have thought.) But that would surely put off for many years the specific benefits of frozen arrays _in today's language and APIs_. We could wait for primitive classes and then cross-apply the concepts to built an enhanced "frozen identity-free" array, which acts like a primitive object. However, that is not a universally applicable patch to the problems which identity-laden frozen arrays address in the present JEP. In particular, the performance model of "frozen identity-free" arrays would presumably suffer from the surprising requirement that the `==` operator (and the `acmp` instructions) must perform O(N) work instead of O(1) work in order to prove that two frozen arrays are distinct. We could try to build in some sort of "lock" operation which converts part or all of an array _in place_ to a frozen status, either permanently or until a corresponding "unlock" operation, or build some sort of "immutable view" of a mutable backing array, like the Collection APIs. Such mechanisms are more complex and costly than the present proposal, and are thus harder to use correctly, as well as being less appealing as an alternative to defensive copying. Allowing an array to change its mutability state _creates a new kind of side channel_ with new concurrency failure modes, and so seems to be a poor solution to many of the problems mentioned above. # Risks and Assumptions The extra checks require some careful optimizer work, analogous to null pointer removal. These might cause performance problems, though not relative to plain `clone`, of course. If the user model is not simple enough, or there are unpleasant surprises, or the performance is not good enough, or if retrofitted APIs are hard to roll out, adoption may be weak. Retrofitting array-returning APIs to return frozen arrays will be a delicate operation, because of use cases in the wild (perhaps 0.0001% of them) where somebody has decided it would be clever to mutate a returned array. This JEP assumes we can afford to add (as a preview) a handful of factories and other API points for arrays, notably the `a.freeze()` notation, as discussed above. Alternatively, we could stage all of these API points as static methods on `jdk.internal.FrozenArrays`, for use internally within `java.base`, and then roll them out as user-visible API points in a separate step. # Dependencies and Future Work There are no dependencies on existing work. Some engineering of array store checks for Project Valhalla may cross-apply to the mutability store checks required by this JEP. If we decide to forbid synchronization on frozen arrays, some work for primitive classes (Valhalla again) can cross-apply. It seems very desirable to add a way for varargs methods to request frozen input arrays. This would have knock-on benefits for both footprint and security. Inside the method, there would be no need for annotations of the form `@SafeVarargs`. Outside the method, clients could preferentially pass frozen arrays instead of mutable ones. If we do frozen varargs, we should consider using a related syntax (a contextual keyword?) to declare initialized arrays and for array creation expressions: ``` int sumInts(__Frozen int... ints) { // bytecode does ints = ints.freeze() int sum = 0; for (int i : ints) sum += i; return sum; } void foo() { sum(); // passes frozen zero-length array? sum(1,2,3); // passes frozen length-3 array? } void bar() { __Frozen int[] stuff = { 1,2,3 }; assert stuff.isFrozen(); sum(stuff); sum(new __Frozen int[]{ 4,5,6 }); } ``` We will want to retrofit many APIs in `java.base`. Examples of API points which could profit from frozen arrays: - reflective array-returning methods in `Class`, `jlr.Executable`, etc. - `Enum.values` - `Throwable.getStackTrace` - `MethodType.parameterArray`, `DynamicCallSiteDesc.bootstrapArgs`, etc. - methods that return byte or char arrays as a "result" or "payload" - `Collection.toArray` method and various related - methods which accept array arguments, such as `Runtime.exec` In many of these cases, it is likely that backward compatibility with a few existing clients may force us to continue to return mutable arrays. This situation can sometimes be remedied by deprecating the existing API point, in favor of a new one which returns the "real" frozen array. The existing API point can be documented as equivalent to running `clone` on the result of the new API point, For example: ``` public interface Collection<T> { /** Returns a frozen array containing all of the elements in this collection, * as if by {@code this.toArray().freeze()}. */ default Object[] freezeArray() { return toArray().freeze(); } } ``` The naming of such new API points will be a delicate matter of balancing conflicting concerns, to say nothing of bikeshed-painting. See non-goals above for more possible follow-on features.
|