Bug ID: JDK-8158765 Isolated Methods

Type: JEP
Component: core-libs
Sub-Component: java.lang.invoke

Priority: P3
Status: Draft
Resolution: Unresolved

Submitted: 2016-06-06
Updated: 2018-04-16

Summary
-------

Extend the `MethodHandles.Lookup` class of the `java.lang.invoke` package to
support loading method bytecodes without an attached class, and to represent
such methods as method handles.

Goals
-----

*   In the `MethodHandles.Lookup` class of the `java.lang.invoke` package,
    provide a new method `loadCode` to load a bytecode array plus constants as
    an isolated method and return a `MethodHandle` representing that method.

*   At the level of the JVM, provide an optimised means to store isolated
    methods.

Non-Goals
---------

*   A new compilation strategy for lambdas is not in the scope of this JEP.

*   Extensions at the Java language level are explicitly out of scope.

*   Extensions to the Java Virtual Machine instruction set (bytecodes) are
    likewise out of scope.

Success Metrics
---------------

*   Improved performance of method handle infrastructure where it makes use of
    bytecode generation (also at startup).

*   Reduced memory footprint of method handle infrastructure (specifically,
    `LambdaForm`s, `BoundMethodHandle`s, and invokers).

*   Observable similar effects on dynamic language implementations once they
    adopt the new API.

Motivation
----------

Both in the JDK core libraries and in language implementations running atop the
JVM, it is a common pattern to generate stateless classes with a single static
method. These classes are used to represent what is logically a method without a
class, or an "isolated method". Generating them is cumbersome as it requires the
generation of a full class, and imposes a certain load on the VM in terms of
class loading and maintenance.

To enable a more lightweight solution for this scenario, there should be a way
of expressing and loading an isolated method directly, and to get hold of it in
a form that can be used to invoke it, and to make sure no access violations are
carried out using it.

Method handles are usable abstractions for representing code that can be called.
Moreover, a means for controlling lookup and access contexts already exists in
the form of the `MethodHandles.Lookup` class. What is missing is a means to load
a method in isolation. By adding a single API entry point to the
`MethodHandles.Lookup` class that accepts the representation of an isolated
method, such a method can be loaded with the lookup context implied by the
`Lookup` instance at hand.

There are several settings in the JDK core libraries, most notably in the
low-level method handles infrastructure, where a new abstraction for isolated
methods can be used to reduce code size and memory footprint, and to improve
loading performance. Moreover, the Nashorn JavaScript engine can make use of the
feature in a similar way, as it generates bytecode from JavaScript sources.
Finally, all language implementations that run atop the JVM and generate
bytecode may be clients of the isolated method loading capability.

It needs to be noted that the aforementioned scenarios all require access to the
internal `Unsafe` API. Offering a disciplined and secure way of defining custom
code in the form of isolated methods will allow for rendering many of these uses
of `Unsafe` unnecessary. Thereby, dependence on internal API, which guest
language implementations on the JVM often have, can be reduced.

Description
-----------

The `MethodHandles.Lookup` class of the `java.lang.invoke` package is to be
extended with a method like this:

    MethodHandle loadCode(String name, MethodType type, byte[] instructions, Object[] constants)

The `name` parameter is optional. It denotes the name the isolated method should
be identifiable by in stack traces.

The `type` parameter determines the method's return type and parameter types.
The `instructions` array contains the method's bytecode instructions, as they 
would occur in a normal class file. A notable difference is that all indices 
into the class' constant pool that the bytecode would normally contain are now 
indices into the accompanying `constants` array. This serves as a method-local 
constant pool substitute.

The `loadCode` method creates a method from the passed bytecode instructions and 
constants and returns a `MethodHandle` that can be used to call the method. The
implementation of `loadCode` will take care of verification of the code to load.

This method is isolated from any class and behaves largely like a static method. 
The method handle resulting from a `loadCode` invocation is of the `REF_static`
kind. It cannot be cracked via `MethodHandles.Lookup.revealDirect()`.

The context for a method defined in this way is determined by the `Lookup` 
instance receiving the `loadCode` call. In case the lookup privileges are not 
sufficient, an exception will be thrown.

**The `constants` Array**

The `constants` array, meant to contain constants referenced from the bytecode,
deserves some attention. First and foremost, it should not be misunderstood as a
constant pool. It rather provides a higher level of abstraction over constant
pool contents, and adds convenience for clients.

The array of constant pool patches that can be passed to invocations of
`Unsafe.defineAnonymousClass` plays a similar role. For instance, the constant
pool patches array allows to pass a `String` where a `CONSTANT_Utf8_info` entry
is to be patched; in fact, that entry consists of a tag byte, two-byte length,
and a character array. `Unsafe.defineAnonymousClass` supports similar
convenience for other constant pool entries too.

For the `constants` array passed to `loadCode`, similar convenience should be
possible. For instance, where the method instructions reference a Java class,
the `constants` array can contain a `Class` instance, rather than lower-level
structures encountered in constant pools. Likewise, an `INVOKEVIRTUAL`
instruction can reference a `constants` array entry that itself is a
`MethodHandle` representing the method in question.

The following table lists the different forms of possible constant pool entries
and the Java classes that can be used to represent them in the `constants`
array.

*   `CONSTANT_Utf8_info`: `java.lang.String`

*   `CONSTANT_Integer_info`: `int`, `java.lang.Integer`

*   `CONSTANT_Float_info`: `float`, `java.lang.Float`

*   `CONSTANT_Long_info`: `long`, `java.lang.Long`

*   `CONSTANT_Double_info`: `double`, `java.lang.Double`

*   `CONSTANT_Class_info`: `java.lang.Class`

*   `CONSTANT_String_info`: `java.lang.String`

*   `CONSTANT_Fieldref_info`: a `java.lang.invoke.DirectMethodHandle` of the
    right kind, obtained via the appropriate API in
    `java.lang.invoke.MethodHandles.Lookup`

*   `CONSTANT_Methodref_info`: a `java.lang.invoke.DirectMethodHandle` of the
    right kind, obtained via the appropriate API in
    `java.lang.invoke.MethodHandles.Lookup`

*   `CONSTANT_InterfaceMethodref_info`: a `java.lang.invoke.DirectMethodHandle`
    of the right kind, obtained via the appropriate API in
    `java.lang.invoke.MethodHandles.Lookup`

*   `CONSTANT_NameAndType_info`: (should not be required)

*   `CONSTANT_MethodHandle_info`: `java.lang.invoke.MethodHandle`

*   `CONSTANT_MethodType_info`: `java.lang.invoke.MethodType`

*   `CONSTANT_InvokeDynamic_info`: *either* a tuple of
    `(java.lang.invoke.MethodType,java.lang.invoke.MethodHandle)`, where the
    `MethodType` describes the call site's signature, and the `MethodHandle`
    represents the bootstrap method with already bound static arguments; *or* an
    already initialized `java.lang.invoke.CallSite`

In addition, the Valhalla project proposes several new constant pool entry
types, for which the substitutions in `constants` arrays can be as follows. Note
that the table assumes tuples, which may be introduced with Valhalla, to be
existent in the language.

*   `CONSTANT_ArrayType_info`: tuple of `(byte,java.lang.Class)`

*   `CONSTANT_MethodDescriptor_info`: array of `java.lang.Class` (`Class`
    instances may have to offer some additional information as Valhalla
    progresses)

*   `CONSTANT_ParameterizedType_info`: tuple of
    `(java.lang.Class,java.lang.Class[])`

*   `CONSTANT_TypeVar_info`: tuple of `(java.lang.String,java.lang.Class)`

As a further addition, the new constant pool entry types discussed in the
[general data in constant pools](https://bugs.openjdk.java.net/browse/JDK-8161256)
proposal can be represented as follows.

*   `CONSTANT_Dynamic`: tuple of
    `(java.lang.Class,java.lang.invoke.MethodHandle)`, where the `Class`
    represents the expected type, and the `MethodHandle` describes a bootstrap
    method with the static parameters already bound

*   `CONSTANT_Group`: an array or `java.util.List`

*   `CONSTANT_Bytes`: `byte[]`

As a *note on generic methods*, it needs to be pointed out that an isolated
method does not have an enclosing class that could define type variables.
Instead, all type variables mentioned in the signature of a generic isolated
method belong to that method alone.

The `constants` array can also contain all kinds of objects that can be loaded
using an `LDC` instruction. This can be used to bind certain specific data that
are known at compile time.

It will be up to the implementation of `loadCode` to turn these convenience
objects into proper lower-level representations resembling those in a constant
pool. The details of this depend on the implementation choices that will be made
for the internal representation of isolated methods.

### Implementation

The `loadCode` functionality can be implemented in several stages. Their depth
of integration with the present system increases.

**Stage 1: Internal Use for `LambdaForm` and Invoker Generation**

The initial version of `loadCode` should be provided as part of the non-public
API for Invokedynamic, e.g., as a non-public method in the
`MethodHandles.Lookup` class, or in the `MethodHandleImpl` class. There, it can
be used to generate `LambdaForm`s and other invokers in the `java.lang.invoke`
implementation. The implementation should not treat isolated methods as such,
but wrap the `LambdaForm` methods in a class as usual. 

**Stage 2: Optimised Internal Use for `LambdaForm` and Invoker Generation**

The internal stage 1 `loadCode` implementation can, while the API remains 
stable, be optimised at the level of HotSpot. At this time, there are two design
ideas that can be explored.

1.  Internally, represent each isolated method as a method plus constant pool.
    The class an isolated method belongs to, to make it fit into the overall
    expectations of the VM, is a pseudo class that cannot be instantiated. This
    resembles the way single static method classes are currently built.

2.  Add to HotSpot the notion of a pseudo class (dubbed `Gargantuan`) that will
    be the holder of all methods defined through the `loadCode` interface. This
    will be an all-static class invisible from the outside (support for
    `getCallerClass` notwithstanding). 

    `Gargantuan` is a class that is intended to grow as new methods are defined.
    Methods can be collected when there are no more `MethodHandle`s referencing
    them. Each method in `Gargantuan` can have a context different from all
    other methods, depending on the lookup context at hand in a `loadCode`
    invocation.  This lookup context is preserved in `Gargantuan` and associated
    with the isolated method during its lifetime.

    The `constants` arrays of several isolated methods will very likely contain
    common constants. The `loadCode` VM-level implementation will make sure to
    only add those constants to the `Gargantuan` constant pool that are not
    already present, and to patch the bytecode instructions array accordingly.
    This elision of duplicate constant pool entries can also take place upon
    garbage collection to facilitate faster loading of isolated methods. Either
    way, all isolated methods share a common constant pool.

    The `Gargantuan` class can also exist once per module, which will enable
    efficient collection of constants stored for an isolated method, and
    possibly collection of other structures, as a module is unloaded.

**Stage 3: Public API**

Eventually, the `loadCode` method should be public in `MethodHandles.Lookup`, 
to support its more widespread usage. In the meantime, availability via the MLVM
repository will allow for applying the `loadCode` feature to existing language
implementations for experimentation.

### Usage Examples

The examples below serve to point out possible future shapes of the
infrastructure needed to generate the `instructions` array. All examples
describe the generation and loading of a method that has the signature
`(Ljava/lang/String;)I` and retrieves the length of its argument. It consists of
these instructions:

    ALOAD_0
    INVOKEVIRTUAL #0 <String.length()>
    IRETURN

The first example adopts the higher level of abstraction over the constant pool,
as suggested above:

    MethodHandlee stringLength = lookup.loadCode("isoToString",
        methodType(int.class, String.class),
        new byte[]{42, 182, 0, 0, 172},
        new Object[]{
            lookup.findVirtual(String.class, "length", methodType(int.class))});

In the above example, the `instructions` array has been provided as immediate
constants. To conveniently generate such arrays and the `constants` arrays they
reference, a convenient generator for isolated methods is conceivable (its API
is inspired by the ASM `GeneratorAdapter`):

    MethodHandle stringLength =
        new IsolatedMethodBuilder("isoToString", methodType(int.class, String.class)).
            loadArg(0).
            invokeVirtual(lookup.findVirtual(String.class, "length", methodType(int.class))).
            returnValue().
            load();

The above examples serve to kick off a discussion about how the isolated methods
loading API and possible supporting API can be shaped.

Alternatives
------------

*Anonymous classes* (obtained via `Unsafe.defineAnonymousClass()`) are an
already existing way to dynamically define classes. They specifically support
use cases where an anonymous method needs to access state associated with it,
e.g., in case of lambda expressions that close over local state. Isolated
methods can substitute those uses of anonymous classes that fall into the
"single static method, no state" category. Isolated methods share with anonymous
classes the characteristic that they cannot be looked up by name.

An alternative approach to speed up bytecode spinning in the JDK core libraries
is to use *bytecode templates* with predefined constant pools where only method
bytecodes are inserted. This approach has two main drawbacks: it would be easily
adoptable only in the core libraries but would not scale out to guest language
implementations; and it would still require the bytecodes in question to be
generated, separating which from the ASM class notion is hard.

Testing
-------

There are no special platform or hardware requirements for testing. As the JDK
core libraries themselves make use of method handles, and as especially the
module system relies on lambdas and the ensuing bytecode spinning, the JDK
itself is an excellent test bed. The existing tests for the method handle
functionality will be valuable as well.

In terms of testing by guest language implementations, all such implementations
that already utilize the method handles API will implicitly be available for
testing with their respective test suites. Experimental extensions of such guest
language implementations can adopt an implementation scheme based on isolated
methods for ongoing testing. The Nashorn JavaScript engine, for instance, is
capable of running a large body of standard JavaScript code, including
benchmarks.

Risks and Assumptions
---------------------

Introducing a new API to load code into the VM is risky per se. If this feature
is deemed too risky, it can be moved to the `Unsafe` API.

Dependences
-----------

This JEP depends on the presence of a bytecode generation framework that
provides easy access to the constant pool, and allows to decouple method
generation from class generation.

Karen ask me to add details of what parts of a current class container you need and what parts you do not. [1] An isolated method is like a static method in an empty class so i need a constant pool, i need the SourceFile, BootstrapMethods and NestHost attributes, i need the Code attributes and all its sub-attributes, and i need a control over the lifetime of the pseudo class, i.e the lifetime of the isolated method is the life time of its nest host or the life time of the returned MethodHandle (this is similar to the flags you pass to defineNestMate). [1] http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2018-April/000612.html [2] https://bugs.openjdk.java.net/browse/JDK-8171335

16-04-2018

Given the investment in new, general constant pool forms, it is worth asking whether the proper building block for isolated methods is not bytes at all, but constant pool constants (in a simple series, or organized in basic blocks). The setting would be a stack machine more or less identical with that of the JVM. Many tokens would simply push themselves, as in Forth. MH's would execute themselves on the stacked operands and stack a result; this covers all forms of invocation and field access. Basic data motion in stack and between stack and locals would be encoded, probably in CONSTANT_Integer tokens. Pros: - Easy to work with (single array of Objects, no cross-referencing) - Easy to compile down to bytecodes or lambda forms - Easy to decompile up to expression trees (potential API for declarative expression trees) - Natural interoperation with other mechanisms that work with bootstrap methods - Can embed in bootstrap specifiers to compose functional values in CP and under indy - Can use to represent small ad hoc expression trees, such as for method template parameters Cons: - Hard to represent complex control flow; must use MH combinators or some other nested representation - Less dense than bytecodes; in effect a u2-based instruction set instead of u1

08-05-2017