JDK-8254232 : Implementation of Foreign Linker API (Incubator)
  • Type: CSR
  • Component: core-libs
  • Priority: P3
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 16
  • Submitted: 2020-10-08
  • Updated: 2020-11-10
  • Resolved: 2020-11-10
Related Reports
CSR :  
Description
Summary
-------

Enhance the incubator module, jdk.incubator.foreign (which is currently used by the Foreign Memory Access API), to add a new API, referred to as the Foreign Linker API, that is designed to facilitate direct and efficient access to foreign functions via the MethodHandle API.  This API provides the fundamental building blocks to replace JNI.

Problem
-------

To date, interacting with native libraries from Java can be a painful process. With the Java Native Interface (JNI), users have to declare *native* methods, then compile their classes which special flags, so that javac will also emit synthetic headers containing C entry points, which the user has then to define and compile (using a platform compiler such as gcc or clang) into a shared library; the shared library has then to be made available at runtime (e.g. via `System::loadLibrary`) so that the JVM will be able to link native method calls. Needless to say, this process is convoluted and error prone, and fails to scale when users have to provide Java bindings for _entire_ native libraries (which might sometimes contain thousands of functions). Moreover, the presence of this intermediate JNI glue code makes projects relying on native methods harder to deploy (as the glue code will likely vary across platform) and to maintain (as such code will need to be updated every time the underlying native library is updated). For this reasons, Java developers cannot easily access high-quality native libraries, or have to resort to third-party frameworks in order to automate some of the steps associated with JNI.

Solution
--------

The Foreign Linker API addresses the aforementioned problems by providing a way to *link* a native function, defined in some native library, directly, using Java code; the result of this linking operation will be a MethodHandle instance which, when called (e.g. using `MethodHandle::invokeExact`) will trigger the corresponding call to the native function. Since it is sometimes helpful to also pass Java code as data to foreign functions, the ForeignLinker API also provides a way to turn an existing Java MethodHandle instance into a `MemorySegment` which can then be passed (as a function pointer) to a native method handle call. The Foreign Linker API can thus allow Java programs to interact with native libraries *without* the need of any intervening glue code (rather, the glue code is generated dynamically, by the Foreign Linker runtime).

Specification
-------------

The implementation of the memory access API exports the following interfaces in the package jdk.incubator.foreign, defined in module jdk.incubator.foreign:

```
LibraryLookup               A lookup class which allows clients to load libraries and lookup symbols inside these libraries.
CLinker                     The main Foreign Linker API implementation.
FunctionDescriptor          An aggregate of `MemoryLayout`s describing the signature of the target foreign function.
NativeScope                 A helper class which allows clients to manage logically related off-heap memory allocations.
```

In traditional Java/JNI scenarios, this is done via the `System::loadLibrary` and `System::load` methods, which internally map into calls to, for instance, `dlopen`. The Foreign Linker API provides a simple library-lookup abstraction via the `LibraryLookup` class (similar to a method-handle lookup), which provides capabilities to look up named symbols in a given native library. We can obtain a library lookup in three different ways:

* `LibraryLookup::ofDefault` ��� returns the library lookup which can see all the symbols that have been loaded with the VM (useful to access symbols in the C standard library)
* `LibraryLookup::ofPath` ��� creates a library lookup associated with the library found at the given absolute path.
* `LibraryLookup::ofLibrary` ��� creates a library lookup associated with the library with given name (this might require setting the java.library.path variable appropriately).

The `CLinker` interface is the cornerstone of the Foreign Linker API. This abstraction plays a dual role. First, for downcalls (e.g. calls from Java to native code), the `CLinker::downcallHandle` method can be used to model native functions as plain `MethodHandle` objects. Second, for upcalls (e.g. calls from native back to Java code), the `CLinker::upcallStub` method can be used to convert an existing `MethodHandle` (which might point to some Java method) into a `MemorySegment`, which can then be passed to a native function as a function pointer. Both methods accept a `FunctionDescriptor` instance, which is an aggregate of memory layouts which is used to describe the signature of a foreign function in full. Moreover, the `CLinker` interface defines many layout constants, one for each main C primitive type. These layouts can be combined using a `FunctionDescriptor` to describe the signature of a C function; these layouts contain special *classification* attributes which are used by the Foreign Linker API runtime in order to correctly convert Java arguments into native arguments (and back). Finally, the `CLinker` API defines several helper functions which allow clients e.g. to convert a Java string into a NULL-terminated C string and back.

The `NativeScope` abstraction allows client to allocate multiple segments which share the _same_ temporal bounds; that is, all segments allocated via a `NativeScope` instance will remain alive until the `NativeScope` itself is alive (in fact, `NativeScope` implements the `AutoCloseable` interface, and can be used in a try-with-resource statements). This is very useful when allocating and managing multiple, logically related, off-heap memory segments.

Note that many of the functionalities provided by the Foreign Linker API are fundamentally _unsafe_. That is, there is no way for the Foreign Linker runtime to verify that e.g. the signature of a native function in a `FunctionDescriptor` is correct, since the underlying shared library typically contains no type information (unless the library is compiled with debugging information). For this reason, many methods in this API are marked as *restricted methods* (this is a concept that has been introduced in the Foreign Memory Access API), and can only be invoked when a read-only JDK property, namely -Dforeign.restricted is set; this property can assume several values - the default value is `deny`, which will trigger a hard exception each time a restricted method is called. Developers can override this property value from the command line, to e.g. `permit`, which will allow calls to this method to succeed.

Note: this way of accessing restricted foreign functionalities through a runtime property is a pragmatic compromise, which will be replaced by a more robust mechanism (e.g. based on the module system) by the time the API exits the incubation stage.


Here are some useful links which should help in navigating through the changes in the API.

Javadoc:

http://cr.openjdk.java.net/~mcimadamore/8254231_v2/javadoc

Specdiff (delta, relative to JDK-8254163)

http://cr.openjdk.java.net/~mcimadamore/8254231_v3/specdiff_out

In addition, a specdiffs of the changes as of November 10th 2020 has been attached to this CSR.




Comments
Thanks for the update; moving to Approved.
10-11-2020

Thanks for the comments, I've uploaded another iteration (see specdiff_v3). As for the C99 standards, we don't promise anywhere in the javadoc that we support the full C99 standard. In fact, I will also be removing support for LONG_DOUBLE, which is currently only working on Windows. The reason being that we can only support types whose layout is up to 64 bits, due to limitations of java primitives (you cannot use a "bigger" carrier for var handle and method handle, w/o introducing boxing). We'd like to address exotic types like complex, long double etc. once Valhalla is ready, at which point we'll have a better way to get there. As for string decoding, as I stated last time, the goal of this method is to provide a behavior that is similar to `new String(byteArray, charset)`. And that method, for better or worse, features standard replacement on invalid input. We could do something special here, but, that could also be met with surprise. If a client really want unmappable chars to result in errors, it can turn a memory segment into a byte buffer (this can be done in a single step, see `MemorySegment::asByteBuffer`) and then obtain a `CharDecoder` which throws on malformed input (e.g. `StandardCharsets.UTF_8.newEncoder().onMalformedInput(CodingAction.REPORT)`) and then use that decoder to decode the byte buffer.
10-11-2020

The CLinker type now using "C_LONG_DOUBLE" but CLinker.TypeKind uses "LONGDOUBLE". The C99 spec also mentions built-in support for Complex, but there is no immediately analogous Java type. If a developer wanted some indication that there was an encoding issue converting to a C string, is there an idiom within the API to achieve that? For example, is there a charset that would fail on "malformed-input and unmappable-character sequences" rather than using "this charset's default replacement byte array"? For the toJavaString* method, I recommend replacing IllegalArgumentException - if the size of the native string is greater than Integer.MAX_VALUE. with something less specific like IllegalArgumentException - if the size of the native string is greater than the largest string supported by the platform For the constant in CLinker.TypeKind, it would be more helpful for readers to have an explicit indication of the attributes of the constant, even if they are implicit. For example, replace "A kind corresponding to the C char type " with "A kind corresponding to the *integral* C char type " and likewise for the floating-point types, etc.
09-11-2020

All the "CLinker.toJavaStringXYZ" seem to have this javadoc text: * @throws IllegalArgumentException if the size of the native string is greater than {@code Integer.MAX_VALUE}. You mention that the string size is less than Integer.MAX_VALUE, but I can find no other reference to this in the SE API (I specifically looked at the String constructor which accepts a byte[], and found no exception, nor any check in the implementation). Please let me know where this condition is specified.
30-10-2020

As for encoding issues, both toJavaString and toCString uses the default behavior of `new String(byte[], Charset)` and `String::bytes(Charset)` which is to replace malformed input with the charset default replacement. This is actually described in the javadoc.
30-10-2020

Thanks Joe - these look like all sensible comments to me. As for platform assumption, we do not really have many assumptions as the underlying support is fairly general and can be adapted to support a variety of platforms and ABIs. At the moment, a CLinker can only be obtained for x64 systems (linux, mac, windows). The ABI support itself does not rely on POSIX, although the library loading mechanism does rely on dlsym (and its equivalent on Windows). But that's a problem the VM implementation would have to solve anyway (as System::loadLibrary needs a similar capability). As for the C version, I believe the CLinker support is defined for a very general subset of C which shouldn't be version specific. The only constraint I see there is coming from va_list which has been introduced in C89 and later refined in C99 - our implementation assumes the C99 refinements.
30-10-2020

Moving to Provisional. Are there host platform assumptions different than host platform assumptions for JNI? Would a host other than POSIX-like or Windows be feasible? What is the baseline version of C assumed in CLinker? Minor feedback: In public interface NativeScope extends AutoCloseable "This class provides a scope..." The word "class" should be "interface" or "type". If NativeScope isn't a candidate to be sealed, I recommend implSpec tags for the default methods. Is not having an allocate method for char a deliberate omission? in CLinker, why is there different phrasing for C_LONGDOUBLE and C_LONGLONG? Were the name C_LONG_DOUBLE and C_LONG_LONG considered? For the toCString methods, is there some way for the methods to indicate something went wrong in terms of encoding, etc.? For toJavaStringRestricted, I believe a JDK implementation may have a restriction on the maximum size of a string that is less than Integer.MAX_VALUE. An exception might be defined for this case too.
25-10-2020