JDK-8275064 : Implementation of Foreign Function & Memory API (Second incubator)
  • Type: CSR
  • Component: core-libs
  • Priority: P3
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 18
  • Submitted: 2021-10-11
  • Updated: 2021-11-22
  • Resolved: 2021-11-22
Related Reports
CSR :  
Description
Summary
-------

This CSR refers to the latest iteration of the Foreign Function & Memory API originally targeted for Java 17, with the goal of further consolidating the API, as well as addressing the feedback received so far from developers.

Problem
-------

Real-world use of the Foreign Function & Memory APIs revealed some remaining usability issues, listed below:

 * There is an asymmetry between the allocation API (`SegmentAllocator`) and the dereference API. More specifically, when allocating a segment from an existing Java value/array, a `SegmentAllocator` also accepts the `ValueLayout` corresponding to the value/array element, so that necessary alignment constraints and endianness can be applied. But the static dereference methods in `MemoryAccess` do not take any layout argument; instead,  they optionally accept a `ByteOrder` argument, to perform byte swapping. This asymmetry can lead to subtle mistakes, where a segment is allocated as an array whose element is defined by a given layout, but then the array is accessed in ways that are incompatible with that layout.

 * Some useful data types (`boolean` and `MemoryAddress`) are not supported by memory access var handles.

 * The API makes excessive use of static methods. There is a class `MemoryAccess` containing several static dereference methods (see above), and the `CLinker` class also contains several static helper functions to e.g. convert a Java string to a C string and back.

 * The `MemoryAddress` class is an entity with its own `ResourceScope` object. The reason for this choice is that a client can e.g. request the base address of a memory segment, and expect the address to keep a reference to the segment scope. But making `MemoryAddress` a scoped entity creates confusion in the more common case where an address is returned by a native call, in which case no spatial, nor temporal bounds are available.

 * Memory layouts interacting with the `CLinker` API needs to be constructed in a special way; they need to embed special *layout attributes* which encode additional information which allows the linker runtime to classify the argument correctly, when a new downcall method handle is created. Also, there seems to be some redundancy in how downcall method handles are created: clients have to pass both a `FunctionDescriptor` *and* a `MethodType`, even though, in most cases, the information in the `MethodType` can be inferred from that in the `FunctionDescriptor`.

 * Calling native functions using downcall method handles can be unsafe: consider the case where a segment is passed *by-reference* to a downcall method handle. In this case, the segment address is obtained, and then passed to the native call. If the segment is a backed by a shared scope, it would be possible for a client in another thread to close the segment scope concurrently - which might cause the native call to malfunction.

 * The way in which dependencies between scopes are set up, using `Resource::acquire/release` is too low-level.  There is no way to explicitly set up a temporal dependency between two scopes, w/o resorting to complex uses of `ResourceScope::addCloseAction`.

Solution
-------

Here we describe the main ideas behind the API changes brought forward in this CSR:

 * The main change in this iteration of the API is that `ValueLayout` is now always associated with a Java carrier type. For this reason, the API features specialized subclasses, like `ValueLayout.OfInt`, `ValueLayout.OfLong` etc. The relationship between `ValueLayout` and a Java carrier simplifies the API in a number of ways:
    - We can define a set of dereference methods accepting a (specialized) value layout subclass; for instance, instead of `getInt()` we can have a method like `get(ValueLayout.OfInt)`. This allows us to fix the asymmetry between the dereference API and the allocation API.
    - We can use the carrier information attached to value layouts to decide how to classify parameters to downcall method handles. This effectively removes the need of accepting a (now redundant) `MethodType` parameter in `CLinker::downcallHandle`. This also makes the *layout attributes* machinery redundant, which is in fact removed in this iteration.
    - We can attach constant var handles to value layouts, which means that obtaining a memory access var handle from a value layout can be far more efficient than before.

 * Support for `boolean` and `MemoryAddress` has been added to memory access var handles. These carriers are considered *secondary* carriers (as opposed to *primary carriers*, such as `byte`, `short`, `char`, `int`, `float`, `long`, `double`). The reason for this distinction is that secondary carriers cannot be copied in bulk to and from memory segments, as each element require some adjustment (e.g. a `MemoryAddress` has to be *lowered* to a `long` value, while `boolean` has to be normalized to either `1` or `0`).

 * The API has been significantly simplified, and some classes have been removed:
   - The `MemoryAccess` class is no longer present. Instead, *instance* dereference methods are present in both `MemorySegment` and `MemoryAddress` (the latter are *restricted*, as an address has no bounds).
   - The `MemoryLayouts` class is also removed. Value layout constants (`JAVA_INT` etc.) have been moved inside `ValueLayout` (while other layout constants have been dropped).
   - Most of the static methods in `CLinker` (e.g. to convert from Java strings to C strings and back) have been moved to `MemorySegment`, `MemoryAddress` and `SegmentAllocator`. The platform-dependent layout constants in `CLinker` (`C_INT` etc.) have been dropped. It is the role of extraction tools to generate layouts for basic C types that are compatible with a given target platform.
   - The `CLinker.TypeKind` enum has been removed (as it is no longer attached to layouts for classification purposes).
   - The `VaList` class has been moved to toplevel.

 * `MemoryAddress` no longer features a `ResourceScope` accessor. That is, `MemoryAddress` denotes a raw machine address, and has no notion of spatial and temporal bounds associated with it. Clients can no longer obtain the base address associated with heap segments (e.g. `MemoryAddress` is for off-heap access only). When parameters are passed by-reference to a downcall method handle, the method handle now takes an `Addressable` parameter, not a `MemoryAddress` one. This change allows memory segments to be passed to downcall method handles more directly; the linker runtime will try to keep such arguments alive for the entire duration of a native call. This greatly enhances the safety of the `CLinker` API, and reduces the number of conversions required in user code.

 * Since `MemoryAddress` no longer has a `ResourceScope`, a new entity named `NativeSymbol` has been added, which represents a symbol in a library (either a function or a global variable). A `NativeSymbol` has a scope and a name, and is accepted by `CLinker::downcallHandle` when creating downcall method handles. Also, `CLinker::upcallStub` returns a new (anonymous) `NativeSymbol`, which points to the native function generated by the VM which calls back to the target Java method handle provided at creation. The scope attached to a native symbol can be closed at any time, and will cause the symbol to be unloaded. Again, `CLinker` will make sure that a native symbol scope cannot be closed *while* in the middle of performing a native call.

 * The `ResourceScope` class contains some simplifications: first, there's no longer a distinction between implicit and explicit scopes. All scopes (but the global scopes) are explicit and can be closed. Some scopes are additionally associated with a `Cleaner` instance. Secondly, a new method `ResourceScope::keepAlive(ResourceScope)` has been added to replace the pair of `ResourceScope::acquire/release` as well as the `ResourceScope.Handle` class.

Specification
-------------

A specdiff of the changes as of November 11th, 2021 has been attached to this CSR (v3).

A link of the latest javadoc (as of November 11th, 2021) is included below:

http://cr.openjdk.java.net/~mcimadamore/JEP-419/v3/javadoc/jdk/incubator/foreign/package-summary.html

A link of the latest specdiff (as of November 11th, 2021) is included below:

http://cr.openjdk.java.net/~mcimadamore/JEP-419/v3/specdiff_out/overview-summary.html
Comments
Moving amended request to Approved.
22-11-2021

For a future iteration, I recommend including an explicit package-level disclaimer JMM not applying to the read and writes done by through this API, or whatever sort of disclaimer is possible. I think some guidance of ABI's, etc. would be a helpful additional too.
22-11-2021

Updated javadoc: http://cr.openjdk.java.net/~mcimadamore/JEP-419/v4/javadoc/jdk.incubator.foreign/jdk/incubator/foreign/package-summary.html Updated specdiff: http://cr.openjdk.java.net/~mcimadamore/JEP-419/v4/specdiff_out/overview-summary.html I've also uploaded the latest specdiff output (v4) to this issue.
22-11-2021

Re. jls tags - I think this is an issue with javadoc rendering of jls tags - the javadoc is like this: ``` Note the use of the * <em>try-with-resources</em> construct: this idiom ensures that all the memory resources associated with the segment will be released * at the end of the block, according to the semantics described in Section {@jls 14.20.3} of <cite>The Java Language Specification</cite>. ``` Which is correct, I think. Note that the Java 17 version of the API javadoc is rendered correctly: https://docs.oracle.com/en/java/javase/17/docs/api/jdk.incubator.foreign/jdk/incubator/foreign/package-summary.html Seems like an issue with generating the javadoc via intellij - I will try to sort it out and regenerate.
22-11-2021

Some replies: * memory model does not affect this API (as JMM doesn't cover off-heap memory usage). * We could perhaps add links of the various ABIs implemented (not C standard, but ABI) * ValueLayout.Of - splitting the hierarchy is crucial to get the dereference methods working (e.g. get(ValueLayout.OfInt) != get(ValueLayout.OfLong). You can't achieve this with an enum. We obviously hope that Valhalla will, eventually, allow us to capture all of this with a single type-variable on ValueLayout) * I will fix the various javadoc issues you mentioned (thanks for the review!)
22-11-2021

A few high-level comments/questions: Is there anything to say (or not say) about memory model (JLS 17.4) interactions and this API? Suggestion: Can CLinker say something about the C standard or ABI it assumes? For the various ValueLayout.OfFoo types, it looks like an enum is not used so that the withBitAlignment, withName, and withOrder methods can have a more specific return type. Is that correct? In terms of the size of the API, it would be preferable if these could be collapsed to not have distinct types; I assume there was already some API exploration here. Lower-level feedback: In the package summary > For example, to allocate an off-heap memory region big enough to hold > 10 values of the primitive type int, and fill it with values ranging > from 0 to 9, we can use the following code: > > `MemorySegment segment = MemorySegment.allocateNative(10 * 4,` > `ResourceScope.newImplicitScope()); for (int i = 0 ; i < 10 ; i++) {` > `segment.setAtIndex(ValueLayout.JAVA_INT, i, 42); }` Doesn't this code fill all the slots with 42 rather than filling in with the value 0 through 9? Code review suggestion: in the package summary > Note the use of the try-with-resources construct: this idiom ensures > that all the memory resources associated with the segment will be > released at the end of the block, according to the semantics described > in Section of The Java Language Specification. Use an in-line @jls tags to the try-with-resources section, 14.20.3. Also in the package summary > We call this guarantee spatial safety; in other words, access to > memory segments is bounds-checked, in the same way as array access is, > as described in Section of The Java Language Specification. Missing section number. In MemoryAddress > Non-platform classes should not implement MemoryAddress directly. Presumably this is no longer needed as the interface is sealed. Likewise > Non-platform classes should not implement MemoryLayout directly. may no longer be needed due to sealing. In NativeSymbol.name(), is there anything to say to constrain (or not constrain) the possible names of a symbol given platform-specific factors? SegmentAllocator.allocateUtf8String: please add an implSpec tag as done for the other default methods on the interface.
22-11-2021

Moving to Finalize. The following two changes were done as a result of the code review process: * CLinker now has two methods (upcallType and downcallType) which return the MethodType associated with a given function descriptor in the context of an upcall stub and a downcall handle, respectively * The method MemorySegment.ofAddressNative has been renamed to MemorySegment.ofAddress (which makes it consistent with other restricted native factories - e.g. VaList::ofAddress) Some typos have been fixed, also as part of the review process.
11-11-2021

Moving to Provisional; I'll take another pass when the request is Finalized.
26-10-2021