Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
JDK-8173063 :
|
|
JDK-8177469 :
|
Summary ------- Enhance the expressiveness of the `enum` construct in the Java Language by allowing type-variables in enums (_generic enums_), and performing sharper type-checking for enum constants. Goals ----- These two enhancements work together to enable enum constants to carry constant-specific _type information_ as well as constant-specific state and behavior. There are many situations where developers have to refactor enums into classes in order to achieve the desired result; these enhancements should reduce this need. The following example shows how the two enhancements work together: ``` enum Argument<X> { // declares generic enum STRING<String>(String.class), INTEGER<Integer>(Integer.class), ... ; Class<X> clazz; Argument(Class<X> clazz) { this.clazz = clazz; } Class<X> getClazz() { return clazz; } } Class<String> cs = Argument.STRING.getClazz(); //uses sharper typing of enum constant ``` Non-Goals --------- This JEP targets specific enhancements to how enum constants are type-checked. As such, other enum-related features such as: * allow enum subclassing * allow enum in non-static contexts are *outside* the scope of this JEP. Motivation ---------- Java enums are a powerful construct. They allow grouping of constants - where each constant is a singleton object. Each constant can optionally declare a body, which can be used to override the behavior of the base enum declaration. In the following we will try to model the set of Java primitive types using an enum. Here's a start: ``` enum Primitive { BYTE, SHORT, INT, FLOAT, LONG, DOUBLE, CHAR, BOOLEAN; } ``` As stated above, an enum declaration is like a class, and can have constructors; we can use this feature to keep track of the boxed class and the default value of each primitive: ``` enum Primitive { BYTE(Byte.class, 0), SHORT(Short.class, 0), INT(Integer.class, 0), FLOAT(Float.class, 0f), LONG(Long.class, 0L), DOUBLE(Double.class, 0d), CHAR(Character.class, 0), BOOLEAN(Boolean.class, false); final Class<?> boxClass; final Object defaultValue; Primitive(Class<?> boxClass, Object defaultValue) { this.boxClass = boxClass; this.defaultValue = defaultValue; } } ``` While this is rather nice, there are some limitations: that the field `boxClass` is loosely typed as `Class<?>`, as the field type needs to be compatible with all the sharper types used by the enum constants. As a result, any attempt to do something like this: ``` Class<Short> cs = SHORT.boxedClass(); //error ``` Will fail with a compile-time error. Even worse, the field `defaultValue` has a type of `Object`. This is unavoidable since the field needs to be shared across multiple constants modelling different primitive types. Hence, static safety is lost, as the compiler allows code like the following: ``` String s = (String)INT.defaultValue(); //ok ``` Let's now try to extend the enum and add some operations to the constants modelling primitive types (for the sake of brevity, in the remainder we will only show a subset of the constants): ``` enum Primitive { INT(Integer.class, 0) { int mod(int x, int y) { return x % y; } int add(int x, int y) { return x + y; } }, FLOAT(Float.class, 0f) { long add(long x, long y) { return x + y; } }, ... ; final Class<?> boxClass; final Object defaultValue; Primitive(Class<?> boxClass, Object defaultValue) { this.boxClass = boxClass; this.defaultValue = defaultValue; } } ``` Again, this results in problems, as there's no way to do something like this: ``` int seven = INT.add(3, 4); //error ``` That's because the static type of `INT` is simply `Primitive` and `Primitive` has no member named `add`. So, in order to add operations to our enum, we need to add the members to the enum declaration itself, as follows: ``` enum Primitive { INT(Integer.class, 0), FLOAT(Float.class, 0f), ... ; final Class<?> boxClass; final Object defaultValue; Primitive(Class<?> boxClass, Object defaultValue) { this.boxClass = boxClass; this.defaultValue = defaultValue; } int mod(int x, int y) { if (this == INT) { return x % y; } else { throw new IllegalStateException(); } } int add(int x, int y) { if (this == INT) { return x + y; } else { throw new IllegalStateException(); } } long add(float x, float y) { if (this == FLOAT) { return x + y; } else { throw new IllegalStateException(); } } ... } ``` But the code above has, again, several problems. First, this breaks encapsulation: suddenly, `Primitive` acquires a bunch of members, none of which make sense for all the constants. As a result, the implementation of each method becomes more convoluted, as the methods must check whether they have been called on the *right* enum constant. Type-safety is also lost, as the compiler will not detect bad usages such as: ``` int zero = FLOAT.mod(50, 2); //ok ``` All the problems described above can be addressed by removing specific asymmetries between enums and classes, and by refining the way in which enum constants are type-checked. More precisely: * allow type-parameter in enum declarations * do not prematurely erase sharp type-information associated with enum constants With these enhancements, the `Primitive` enum can be rewritten as follows: ``` enum Primitive<X> { INT<Integer>(Integer.class, 0) { int mod(int x, int y) { return x % y; } int add(int x, int y) { return x + y; } }, FLOAT<Float>(Float.class, 0f) { long add(long x, long y) { return x + y; } }, ... ; final Class<X> boxClass; final X defaultValue; Primitive(Class<X> boxClass, X defaultValue) { this.boxClass = boxClass; this.defaultValue = defaultValue; } } ``` This generic declaration is clearly more expressive than the previous one - now the enum constant `Primitive.INT` has a sharper parameterized type `Primitive<Integer>` which means that its members are also sharply typed: ``` Class<Short> cs = SHORT.boxedClass(); //ok! ``` Also, since type information on enum constants is not prematurely erased, the compiler can reason about membership of constants - as demonstrated below: ``` int zero_int = INT.mod(50, 2); //ok int zero_float = FLOAT.mod(50, 2); //error ``` The compiler is now able to reject the second statement as there's no member `mod` in the enum constant `FLOAT` - which guarantees extra type-safety. Description ----------- ### Generic enums As discussed in JDK-6408723, an important requirement for allowing generics in enums is that type-parameters are fully bound in the enum constant declaration. This allows for a straightforward translation scheme which can augment the one we have today - for instance, given an enum declaration like the following: ``` enum Foo<X> { ONE<String>, TWO<Integer>; } ``` The corresponding desugared code will look as follows: ``` /* enum */ class Foo<X> { static Foo<String> ONE = ... static Foo<Integer> TWO = ... ... } ``` That is, it is still possible to map each constant to a static field declaration, as type bindings are all statically known. It might be desirable to allow diamond on enum constant initialization - for instance: ``` enum Bar<X> { ONE<>(Integer.class), TWO<>(String.class); Bar(X x) { ... } } ``` If the diamond syntax is used, special care is required if the enum constant has a body (i.e. it is translated into an anonymous class) and the inferred type is non-denotable. As in the case for diamond with anonymous inner classes, the compiler will have to reject that case. ### Sharper typing of enum constants Under current rules, the static type of an enum constant is the enum type itself. Under such rules, the constants `Foo.ONE` and `Foo.TWO` above will both have the same type, namely `Foo`. This is undesirable for at least two reasons: * in case of a generic enum (as `Foo`), the static type of a constant is not sharp enough to capture the full type info carried by that constant * even in the absence of generic enum, the constant type is not sharp enough to let a client access a member that is only defined on that enum constant (see the example at the beginning of this page) To overcome this limitation, typing of enum constants should be redefined so that a given enum constant gets its own type. Let E be an enum declaration, and C be a (possibly generic) enum constant declaration in E. The constant C is associated with a sharper type if either of the following conditions are satisfied: * `C` is of the kind `C<T1, T2 ... Tn>` but declares no body; the constant sharper type is `E<T1, T2 ... Tn>` * C has a body; the constant sharper type is an anonymous type (written `E.C`) whose supertype is either * `E<T1, T2, ... Tn>` if `C` is of the kind `C<T1, T2, ... Tn>` and `E` is a generic enum * `E`, if E is non-generic These enhanced typing rule allow the static types for `Foo.ONE` and the one for `Foo.TWO` to be different. Additional Considerations ----------- **Binary compatibility** Let's assume we have the following enum: ``` enum Test { A { void a() { } } B { void b() { } } } ``` As we have seen, this would be translated as follows: ``` /* enum */ class Test { static Test A = new Test() { void a() { } } static Test B = new Test() { void b() { } } } ``` If we allow sharper type for enum constants, a naive approach would translate the code as follows: ``` /* enum */ class Test { static Test$1 A = new Test() { void a() { } } static Test$2 B = new Test() { void b() { } } } ``` Here, the binary incompatibility is manifest: the type of the enum constant `A` just changed from `Test` to `Test$1` upon recompilation. This change is going to break non-recompiled clients using `Test`. To overcome this problem, it is better to take an erasure-based approach: while the static type of `A` might be the sharper type `Test.A` - any reference to the type of the constant gets erased to the base enum type `Test`. This leads to code that is binary compatible with respect to what we had before. However, if everything gets erased to `Test`, how is access to members of a specific enum constants implemented? ``` Foo.A.a(); ``` It is easy to see that, if in the code above, symbolic references to `A` are erased to `Test`, the method call will not be well-typed (as `Test` does not have a member named `a`). To overcome this problem, the compiler has to insert a synthetic cast: ``` checkcast Test$1 invokevirtual Test$1::a ``` This is not dissimilar with what happens when accessing members of an intersection type through erasure. Another orthogonal observation is that the current naming scheme for enum constants classes is too fragile - the names `Test$1` and `Test$2` shown above are essentially order-dependent - this means that changing the order in which enum constants are declared could lead to binary compatibility issues. More specifically, if in the code above `A` is swapped with `B` and the enum is recompiled, the client bytecode above would fail to link, as `Test$1` would no longer have a member method named `a`. This is in stark contrast with the respect to what the JLS has to say about binary compatible evolution of enums: > Adding or reordering constants in an enum will not break compatibility with pre-existing binaries. One way to preserve binary compatible evolution would be to emit order insensitive class names, such as `Test$A` and `Test$B` instead of `Test$1` and `Test$2`. The impact of such a change with respect to reflection and serialization is discussed below. **Serialization** In Java, all enums are implicitly serializable, as Enum implements Serializable. We would like that the changes provide here be serialization-compatible; they should not change the serialized form. The serialization specification: http://docs.oracle.com/javase/6/docs/platform/serialization/spec/serial-arch.html#6469 provides special treatment for enums; the serialized form of an enum constant is its name only, and it is not possible to customize serialization/deserialization of an enum constant. (Note that all enum constants are initialized during the `<clinit>`, and the `Enum.valueOf` method that is used by deserialization calls the enum's static `values()` method, which implicitly forces initialization of the base enum class (and of all the constants)). In other words, no compatibility problem with respect to the serialized form exists, as the serialized form already does not depend on the class name generated by the compiler. **Reflection** Another place where binary names come up is reflection. The following is perfectly legal reflective code: ``` Class<?> c = Class.forName("Test$1"); System.err.println(c.getName()); //prints Test$1 ``` While reflection has restrictions in order to prevent an enum constant to be instantiated reflectively, there's no restriction for inspecting the members of an enum constant class. Therefore, existing code using the idiom above would cease to work should we change the binary form of enum constants. **Denotability** Currently, an enum constant is a value, not a type. So, a legitimate question is as to whether enum constants should also be denotable types. The usual arguments apply here - on the one hand, having a denotable type for an enum constant makes it less magic, and allow programmer to declare variable with that type. But there are also disadvantages: * could make the code less readable (e.g. A a = A) - as the same ident could mean both value and type * not clear as to whether all enum constants get their own type; what about an enum constant that does not declare any additional member? Is its type just an alias for the base enum type? On the other hand, if the enum constant type is a non-denotable type, it becomes an opaque thing that programmers can only interact with indirectly (e.g. through type inference). To mitigate some of the drawbacks of a non-denotable type, it is important to note that the proposal to add local variable type inference could technically allow programmers to declare variables with the sharper enum type, even though it is non-denotable (e.g. `var a = A`). **Accessibility** There is one corner case with respect to accessibility of members through the enum sharper type. Consider the following case: ``` package a; public enum Foo { A() { public String s = "Hello!"; }; } package b; class Client { public static void main(String[] args) { String s = Foo.A.s; //IllegalAccessError } } ``` When executing this code, the VM will issue an `IllegalAccessError`; the problem is that the anonymous class for the enum constant `Foo$A` is package-private; as a result, an attempt to access a public field in a package-private class from another package will result in an access error. To overcome this problem, the enum constant class should have same modifier as the enum class in which it is defined. **Source compatibility** From a source compatibility perspective, there are cases in which sharper typing could leak out as a result of an interaction between this feature and type inference - consider the following code: ``` EnumSet<Test> e = EnumSet.of(Test.A); ``` The code above used to behave in a relatively straightforward fashion: the static type of `Test.A` is simply `Test`, meaning that inferring the type-variable of `EnumSet.of` was simple, as both constraints named the type `Test`. But if we change the way in which `Test.A` is type-checked, the behavior gets more interesting: the type-variable of `EnumSet.of` will get two competing constraints: it must be equal to `Test` (form the target-type) and it must be a supertype of `Test.A`. Luckily, in such a scenario, type inference is smart enough to prefer the stricter equality constraint, and ends up inferring `Test`. All things considered, the source compatibility impact of this change is not too different from the one in JDK-8075793, where the change caused capture variables to appear in more places instead of their upper bounds. Risks and Assumptions --------------------- This proposal has two main risks outlined in the sections above: * change in binary names of enum constants could lead to issues with core reflection * change in typing of enum constants could result in subtle changes in method type inference, especially in the absence of a target-type The first problem is probably nothing to be concerned about; as it has been shown, binary names of enum constants is currently very fragile and prone to re-ordering issues. As a result, any code that is relying on the binary name of an enum constant is inherently fragile, as it is essentially relying on the output of a specific compiler. The second problem is more worrisome, as it could cause potential source compatibilities. In order to detect how frequent the source incompatibility scenario described above could be, we have measured how many times the `EnumSet.of` method was called with various arities; for each call we kept track of whether the call occurred in a context where a target type was available. Below are the results (the measurements have been taken against the full open JDK forest). * Total calls to EnumSet.of: 150 * calls with arity = 1 : 69 * of which, without target-type: 0 In other words, the source compatibility scenario described above does not seem to pose any serious threat. Dependencies ------------ The sharper type used for an enum constant are not necessarily denotable; these would constitute another category of non-denotable types. This may interact with the treatment of non-denotable types in JEP-286 (Local Variable Type Inference). Depending on decisions made in JEP-286 regarding non-denotable types, one might be able to say: var a = Argument.String; and have the type of `a` be the sharper type `Argument.String` rather than the coarser type `Argument`.
|