JDK-8161256
general data in constant pools
The constant pool has a native ability to represent a small variety of constant values, including primitives, classes, strings, and method handles. As `invokedynamic` instructions become more widely used, it becomes more important to be able to synthesize bootstrap constants of unforeseen types, starting with booleans, other sub-int values (bytes, shorts, chars), enum values, primitive classes (int.class), nulls, annotation values, and array data. (See JDK-8161250 for an example; there are many more coming in Project Valhalla.)
In order to support extended constants, it is necessary to store blocks of raw data, and sequences of individual constants, in the constant pool.
It is not necessary, however, to have a new kind of constant pool entry for each imaginable type of constant. Instead, `invokedynamic` offers a good alternative. We can engineer a suitable range of static representations for all of our constants by choosing a range of useful metafactories, each of which can link a constant-producing `invokedynamic` call site. The static metadata passed to each metafactory can be (almost) any sequence of constants from the constant pool. The runtime library, not the JVM, must define this range of array constant metafactories. This pushes the representation problem, from the JVM and its class file format, up to the libraries. This scheme is much more future-proof, and can represent future types (like values and flat value-array constants) without new constant pool constants.
Specifically, it is enough to be able to do three tricks: 1. Call a bootstrap method, 2. group a bundle of constants together in a logical unit, and 3. produce a block of uninterpreted bytes. (Tricks 2 and 3 will be used with trick 1, typically, but can have other applications too.)
In more detail, the tricks are:
1. CONSTANT_Dynamic (new constant tag 17): Be able to compose a constant of an arbitrary type given some sort of raw material; like `invokedynamic`, it will use a bootstrap method to invoke some sort of metafactory.\[1]
2. CONSTANT_Group (new constant tag 13): Be able to create a sequence (of length to about 2B) of individual constants, to be used as raw material by a bootstrap method argument (for CONSTANT_Dynamic, `invokedynamic`, or a similar mechanism). The specific type of the raw constant-sequence would be a raw, unmodifiable List created by the JVM and backed by the constant pool itself.
3. CONSTANT_Bytes (new constant tag 2): Be able to create a sequence (of length to about 2B) of undifferentiated bytes, to be used as raw material, like a CONSTANT_Group, except for bytes. The storage would be used by a metafactory to fill in any kind of primitive array (or any similar use). Any kind of compression or compaction, if present, would be a bilateral agreement between the static compiler (javac) and the runtime, with the JVM simply forwarding the bits. The concrete type of this sequence will be ByteSequence, a transliteration of CharSequence to carry bytes.
Each of these three new constant pool types is usable with ldc and also as an argument to a bootstrap method (in the BootstrapMethods attribute).
(Notes on code point selection for tags: Java 9 uses codes 19 and 20 for the module system. The code 2 has been held open under the name CONSTANT_Unicode but has never been used, probably because of the wide adoption of UTF8 for string data. As an unused string format it is suitable to repurpose for CONSTANT_Data since the latter is an alternative string format, for non-textual binary data. Generally speaking, text strings and binary strings are similar but distinct data types. The tag 17 for CONSTANT_Dynamic uses an obsolete tag to support invokedynamic instructions, which was never standardized or used. It is fitting that the two BSM-using constant pool types should have adjacent tags. The CONSTANT_Group tag 13 lies in a gap previously left open for the module system in the JDK 7 time frame.)
Structurally, a CONSTANT_Dynamic has two components after its tag byte: The index of a bootstrap method, in the same format as the index found in a CONSTANT_InvokeDynamic, and a CONSTANT_NameAndType which encodes the expected type, along with a name.
As with `invokedynamic`, the name component is an additional channel, besides the type, for passing expression information to the bootstrap method. It is expected that just as `invokedynamic` instructions find uses for the name component (e.g., a method name or some ad hoc descriptor) dynamic constants will also find uses for the name (e.g., the name of a `enum` constant or the spelling of a symbolic constant). Putting the `CONSTANT_NameAndType` in both places makes for a more regular design. In effect, a `CONSTANT_Methodref` and `CONSTANT_Fieldref` constants are used to refer to named members of classes, while the analogous `CONSTANT_InvokeDynamic` and `CONSTANT_Dynamic` constants are used to refer to named entities with user-programmed bootstraps.
A CONSTANT_Group has one component after its tag byte: A 16-bit index into the `ConstantGroups` attribute. The layout of this attribute is roughly similar to that of the BootstrapMethods attribute. It starts with a 16-bit length K, and is followed by an array of K group entries, each one specifying a particular group of constants. Each group entry begins with an initial length field N of 32 bits, a size field of 32 bits (to ease jumping to the next group entry), followed by a stream of N constant data. Each constant datum begins with any of the constant tags allowed as a bootstrap method argument, or CONSTANT_Utf8 (for expressing strings directly), or CONSTANT_Group (for expressing inline sub-groups), or the distinguished byte value zero.
In the last case, the zero byte is immediately followed by a 16-bit index into the main constant pool. In this way, a CONSTANT_Group can contain any mix of "private" constants (not otherwise used in the class file) or "shared" constants (stored in the main constant pool). If "private" constants were not allowed, it is likely that the main constant pool would quickly exceed its maximum size of 2^16 entries.
(Internally, the mechanism for resolving a CONSTANT_Group should parse the contents of the group as needed to find the locations of the various component constants. The JVM should allocate an offset table to record the results of this parsing, as well as an initially-null table for holding the constants themselves. But the constants should not be actually resolved until and unless the user of the List embodying the Group asks for an element of the List. At that point the List implementation should go back to the JVM and ask for the constant, plugging it into the second table. The second table should be available to both the JVM and the Java code implementing the List. A similar mechanism should also provide the lists of static bootstrap arguments from the JVM to the Java runtime.)
Group constants can nest, with the nested "sub-groups" always being inlined into the containing group. (A cross-group reference must go through the main constant pool, using a zero tag byte as described in the previous paragraph.) The format of an inlined CONSTANT_Group is different from one in the main constant pool. In this case, the following 16-bit field is not an index into the `ConstantGroups` attribute, but rather a directly specified sub-group length, followed immediately by that many constants to collect into the sub-group. In this way, the incremental cost of adding a small extra sub-group, within a large nest of groups, is small, only 16 bits plus the sub-group constants themselves. This means groups can be used to encode Lisp-like tree expressions or AST nodes, if desired.
(Inline constants in constant groups or subgroups provide a necessary way to express bulky constant information without overflowing the rather small size limit, 65535 items, of the main constant pool. The important types to make inline are UTF8, Integer, Long, Float, Double, Dynamic, Data, and Group. The inline form of these constants is straightforward but not specified here. There is less benefit to allowing the other types to be inlined into groups; these types include Class, MethodType, and MethodHandle. Those types are determined by indexes to items in the main constant pool and so might as well themselves be in that constant pool.)
A CONSTANT_Data has two components after its tag byte: a 32-bit length, followed by the indicated number of bytes. (In this it is similar to a CONSTANT_Utf8, but allows longer lengths and has no structural requirements on the data.)
Behaviorally, a CONSTANT_Dynamic constant is resolved by executing its bootstrap method on the following parameters: 1. a local Lookup object, 2. the Class representing the expected constant type, and 3. any remaining bootstrap arguments. (If we choose to use CONSTANT_NameAndType as the carrier of type information, it would also contribute a name string argument.) As with `invokedynamic` multiple threads can race to resolve, but a unique winner will be chosen and any other contending answers discarded. Instead of returning a `CallSite` object (as the `invokedynamic` instruction requires) the bootstrap method would return a value which would be immediately converted to the required type.
Note that we use just one BootstrapSpecifiers attribute to hold all BSM references, but that the way these references are invoked differs according to the kind of structure being bootstrapped. An invokedynamic instruction is bootstrapped as CallSite cs = (T) bsm.invoke(L, "name", MethodType, arg...), while a CONSTANT_Dynamic is bootstrapped as CT con = (CT) bsm.invoke(L, CT.class, arg...). The type CT can be any type other than void, including a primitive type. The BSM can return anything that eventually converts (via the asType call inherent in MH.invoke) to the required type (CallSite, CT, etc.).
A CONSTANT_Group is resolved by building a non-modifiable List (of unspecified implementation class) which is backed by the constants specified in the selected element of the `ConstantGroups` attribute. These constants are resolved lazily, after the List is returned to the user, and on first reference to any given constant.
(As noted above, this lazy resolution requires a handshake with the JVM. It can be very low-level, since it will not be called by user code. Probably it should be "get group constant at byte offset X in the ConstantGroups attribute".)
Any resolution cycles are detected (in the same thread) when resolving a CONSTANT_Dynamic or CONSTANT_Group, causing a linkage error instead of a stack overflow.
One of the motivations for CONSTANT_Group is to lift the limitation on the number of static arguments that can be passed to a bootstrap method, and to allow bootstrap methods greater control over the sequencing and transformation of errors arising from linkage failures among the constants they depend on. Independently, BSMs should _also_ be allowed to take their static arguments in the form of a List, so that BSMs can control exceptions and take larger constant packs even without using groups. This should be done even before the CONSTANT_Group feature is implemented, although the inline-constant functionality allows constants to scale larger than the BSM-based workaround.\[2]
A CONSTANT_Data is resolved by building a ByteSequence which is backed by the byte array in the constant itself. This is an O(1) overhead, so it is quite efficient. The data can be copied directly from the constant pool to the desired data structure, or it can be parsed or decoded as desired by the metafactory.
Rationale Notes: It does not appear to be helpful to collapse `CONSTANT_Dynamic` into `CONSTANT_InvokeDynamic`. Most fundamentally, a `CONSTANT_InvokeDynamic` cannot be the operand of an `ldc` and therefore cannot itself serve as a bootstrap method argument. But it appears that nested bootstrap method arguments are necessary, for some use cases, to defeat the arity limit for bootstrap methods.
The CONSTANT_Bytes constant should *not* resolve to a byte array, since that is likely to require an undesirable extra copy. Using a ByteSequence allows the metafactory to place the payload bits once; metafactories can even use a zero-copy method by keeping the CP-backed ByteSequence.
The CONSTANT_Group constant performs two roles in this design. First, it patches over the odd limit of about 250 arguments imposed by the JVM on all method calls, including metafactory calls. Second, it allows very large collections of constants to be defined without overflowing the 16-bit limit of indexes in the main constant pool.
Groups do not slow down the initial parsing of the constant pool, even if they are large, because they are contained in their own individual "envelopes" in the `ConstantGroups` attribute. When unpacking the first group, offsets of the individual groups should probably be computed in a pre-pass. These numbers could be inserted into the same table as will eventually contain the resolved List objects; a tagging scheme can disambiguate resolved from unresolved states (i.e., managed pointer from offset value).
This RFE should be tried experimentally in Project Valhalla and refined as needed.
\[1]: Appendix: Constant type rationale
\[2]: Appendix: BSM arity extension
In order to provide greater control over linkage exceptions, any non-varargs bootstrap method should be allowed to receive extra arguments in the form of a trailing `List` argument, as if the `List` were the trailing array of a varargs method. In this case, the JVM does not eagerly resolve the bootstrap method argument bundled into the list, but instead arranges a `List` implementation which lazily resolves constants. A failed resolution causes the `List` method to throw a `LinkageError` instead of producing the missing value. Such errors can be directly passed out of the BSM, or could be transformed as needed. This approach also relaxes the limitation of 252 static arguments, and allows the BSM freedom to receive the maximum number of static arguments allowed by the class file format, which is 65535. Such a move reduces, though does not eliminate, the need for CONSTANT_Group.
The initial version of dynamic constants will use a simpler tactic to relax arity restrictions: If the BSM is varargs, the excess static arguments will simply be bundled into an array.
Appendix: ConstantValue attributes and lazy linkage
Named constants (`static final` fields) should be allowed to
have a `ConstantValue` attribute which points to a `CONSTANT_Dynamic`
constant. The JVM should resolve a `getstatic` to such a name as a
reference to the named constant, as if by `ldc`. Coupled with source
language support, this will allow clean definition of compile-time
constants of all types, with better laziness properties than static
initializers can provide.