Bug ID: JDK-8207851 JEP 352: Non-Volatile Mapped Byte Buffers

JDK-8207851 : JEP 352: Non-Volatile Mapped Byte Buffers

Type: JEP
Component: core-libs
Sub-Component: java.nio

Priority: P4
Status: Closed
Resolution: Delivered
Fix Versions: 14

Submitted: 2018-07-19
Updated: 2022-08-16
Resolved: 2020-01-07

Related Reports

Duplicate :	JDK-8153111 - (bf) Allocating ByteBuffer on heterogeneous memory
Relates :	JDK-8221397 - (fc) Support implementation-defined Map Modes
Relates :	JDK-8221477 - Inject os/cpu-specific constants into Unsafe from JVM
Relates :	JDK-8221696 - (bf) MappedByteBuffer.force method to specify range

Sub Tasks

JDK-8224974 :

Implement JEP 352 - Resolved

Description

Summary
-------

Add new JDK-specific file mapping modes so that the `FileChannel` API can be
used to create `MappedByteBuffer` instances that refer to non-volatile memory.

Goals
-----

This JEP proposes to upgrade `MappedByteBuffer` to support
access to non-volatile memory (NVM). The only API change required is a
new enumeration employed by `FileChannel` clients to request mapping of a
file located on an NVM-backed file system rather than a conventional,
file storage system. Recent changes to the `MappedByteBufer` API mean that
it supports all the behaviours needed to allow direct memory updates and
provide the durability guarantees needed for higher level, Java client
libraries to implement persistent data types (e.g. block file systems,
journaled logs, persistent objects, etc.). The implementations of
`FileChannel` and `MappedByteBuffer` need revising to be aware of this new
backing type for the mapped file.

The primary goal of this JEP is to ensure that clients can access and
update NVM from a Java program efficiently and coherently. A key element
of this goal is to ensure that individual writes (or small groups of
contiguous writes) to a buffer region can be committed with minimal
overhead i.e. to ensure that any changes which might still be in cache
are written back to memory.

A second, subordinate goal is to implement this commit behaviour using
a restricted, JDK-internal API defined in class `Unsafe`, allowing it to
be re-used by classes other than `MappedByteBuffer` that may need to
commit NVM.

A final, related goal is to allow buffers mapped over NVM to be tracked
by the existing monitoring and management APIs.

N.B. It is already possible to map a NVM device file to a `MappedByteBuffer` and
commit writes using the current `force()` method, for example using Intel's
`libpmem` library as device driver or by calling out to `libpmem` as a
native library. However, with the current API both those implementations
provide a “sledgehammer” solution. A force cannot discriminate between
clean and dirty lines and requires a system call or JNI call to
implement each writeback. For both those reasons the existing capability
fails to satisfy the efficiency requirement of this JEP.

The target OS/CPU platform combinations for this JEP are Linux/x64 and Linux/AArch64. This restriction is imposed for two reasons. This feature will only work on OSes that support the `mmap` system call `MAP_SYNC` flag, which allows synchronous mapping of non-volatile memory. That is true of recent Linux releases. It will also only work on CPUs that support cache line writeback under user space control. x64 and AArch64 both provide instructions meeting this requirement.

Non-Goals
---------

The goals of this JEP do not extend beyond providing access to and
durability guarantees for NVM. In particular, it is not a goal of this
JEP to cater for other important behaviours such as atomic update of NVM,
isolation of readers and writers, or consistency of independently
persisted memory states.

Recent Windows/x64 releases do support the mmap `MAP_SYNC` flag. However, the goal of providing this capability for that OS/CPU combination (or any other possible other platforms) is deferred to a later update.

Success Metrics
---------------

The efficiency goal is hard to quantify precisely. However, the cost of
persisting data to memory should be significantly lowered relative to
two existing alternatives. Firstly, it should significantly improve on
the cost incurred by writing the data to conventional file storage
synchronously, i.e., including the usual delays required to ensure that
individual writes are guaranteed to hit disk. Secondly, the
cost should also be significantly lower than that incurred by writing to
NVM using a driver-based solution reliant on system calls such as `libpmem`.
Costs might reasonably be expected to be lowered by an order of
magnitude relative to synchronous file writes and by a factor of two
relative to using system calls.

Motivation
----------

NVM offers the opportunity for application programmers to create and
update program state across program runs without incurring the
significant copying and/or translation costs that output to and input
from a persistent medium normally implies. This is particularly
significant for transactional programs, where regular persistence of
in-doubt state is required to enable crash recovery.

Existing C libraries (such as Intel's `libpmem`) provide C programs with
highly efficient access to NVM at the base level. They also build on
this to support simple management of a variety of persistent data types.
Currently, use of even just the base library from Java is costly because of the
frequent need to make system calls or JNI calls to invoke the primitive
operation which ensures memory changes are persistent. The same problem
limits use of the higher-level libraries and is exacerbated by the fact
that the persistent data types provided in C are allocated in memory not
directly accessible from Java. This places Java applications and
middleware (for example, a Java transaction manager) at a severe
disadvantage compared with C or languages which can link into C
libraries at low cost.

This proposal attempts to remedy the first problem by allowing efficient
writeback of NVM mapped to a `ByteBuffer`. Since `ByteBuffer`-mapped memory
is directly accessible to Java this allows the second problem to be
addressed by implementing client libraries equivalent to those provided
in C to manage storage of different persistent data types.

Description
-----------

### Preliminary Changes

This JEP makes use of two related enhancements to the Java SE API:

  1. Support implementation-defined Map Modes ([JDK-8221397](https://bugs.openjdk.java.net/browse/JDK-8221397))

  2. `MappedByteBuffer::force` method to specify range
([JDK-8221696](https://bugs.openjdk.java.net/browse/JDK-8221696))

### Proposed JDK-Specific API Changes

1) Expose new `MapMode` enumeration values via a public API in a new module

A new module, `jdk.nio.mapmode`, will export a single new package of the same name. A public extension enumeration `ExtendedMapMode` will be added to this package:

    package jdk.nio.mapmode;
    . . .
    public class ExtendedMapMode {
        private ExtendedMapMode() { }

        public static final MapMode READ_ONLY_SYNC = . . .
        public static final MapMode READ_WRITE_SYNC = . . .
    }

The new enumeration values are used when calling the `FileChannel::map` method to create, respectively, a read-only or read-write `MappedByteBuffer` mapped over an NVM device file. An `UnsupportedOperationException` will be thrown if these flags are passed on platforms which do not support mapping of NVM device files. On supported platforms, it is only appropriate to pass these new values as arguments when the target `FileChannel` instance is derived from a file opened via an NVM device. In any other case an `IOException` will be thrown.

2) Publish a `BufferPoolMXBean` tracking persistent `MappedByteBuffer` statistics

The `ManagementFactory` class provides method `List<T> getPlatformMXBeans(Class<T>)` which can be used to retrieve a list of `BufferPoolMXBean` instances tracking `count`, `total_capacity` and `memory_used` for the existing categories of mapped or direct byte buffers. It will be modified to return an extra, new `BufferPoolMXBean` with name `"mapped - 'non-volatile memory'"`, which will track the above stats for all `MappedByteBuffer` instances currently mapped with mode `ExtendedMapMode.READ_ONLY_SYNC` or `ExtendedMapMode.READ_WRITE_SYNC`. The existing `BufferPoolMXBean` with name `mapped` will continue only to track stats for `MappedByteBuffer` instances currently mapped with mode `MapMode.READ_ONLY`, `MapMode.READ_WRITE` or `MapMode.PRIVATE`.

### Proposed Internal JDK API Changes

1) Add new method `writebackMemory` to class `jdk.internal.misc.Unsafe`

    public void writebackMemory(long address, long length)

A call to this method ensures that any modifications to memory in the address range starting at `address` and continuing up to (but not necessarily including) `address + length` are guaranteed to have been written back from cache to memory. The implementation must guarantee that all stores by the current thread that i) are pending at the point of call and ii) address memory in the target range are included in the writeback (i.e., there is no need for the caller to perform any memory fence operation before the call). It must also guarantee that writeback of all addressed bytes has completed before returning (i.e., there is no need for the caller to perform any memory fence operation after the call).

The writeback memory operation will be implemented using a small number of intrinsics recognised by the JIT compiler. The goal is to implement writeback of each successive cache line in the specified address range using an intrinsic that translates to a processor cache line writeback instruction, reducing the cost of persisting data to the bare minimum. The envisaged design also employs a pre-writeback and post-writeback memory synchronizaton intrinsic. These may translate to a memory synchronization instruction or to a no-op depending upon the specific choice of instruction for the processor writeback (x64 has three possible candidates) and the ordering requirements that choice entails.

N.B. A good reason for implementing this capability in class `Unsafe` is that it is likely to be of more general use, say for alternative data persistence implementations employing non-volatile memory.

Alternatives
------------

Two alternatives were tested in the [original prototype](https://github.com/jhalliday/pmem/).

One option was to use `libpmem` in driver mode, i.e., 1) install `libpmem` as
the driver for the NVM device, 2) map the file as per any  other
`MappedByteBuffer`, and 3) rely on the `force` method to do the update.

The second alternative was to use `libpmem` (or some fragment thereof) as
a JNI native library to provide the required buffer mapping and
writeback behaviour.

Both options proved very unsatisfactory. The first suffered from the
high cost of system calls and the overhead involved in forcing the whole
mapped buffer rather than some subset of it. The second suffered from
the high cost of the JNI interface. Successive iterations of the second
approach (adding first registered natives and then implementing them as
intrinsics) provided similar performance benefits to the current draft
implementation

A third alternative that was considered is to wait for Project Panama to provide access to foreign libraries and foreign datatypes mapped over NVRAM without incurring the overheads of JNI. While this is still considered to be a worthwhile option for the future it was decided that the current proposal is worth pursuing for two reasons: firstly, to allow users to experiment with the use of NVRAM from Java immediately, as it begins to become available; and secondly, to ease the transition involved in such a transition by supporting a model for use of NVRAM derived from the existing, familiar `MappedByteBuffer` API.

Testing
-------

Testing will require an x64 or AArch64 host fitted with an NVM device
and running a suitably up to date Linux kernel (4.16).

Testing on AArch64 may not be possible until suitable NVM devices are available for this architecture. As an alternative testing may need to proceed by mapping volatile memory and using it to simulate the behaviour of an NVM device.

Testing on both target architectures may be difficult; in particular, it may suffer from false positives. A failure in the writeback code can only be detected if it is possible to kill a JVM with those pending changes unflushed and then to detect that omission at restart.

This situation may be difficult to arrange when employing a normal JVM
exit (normal shutdown may end up causing those pending changes to be
written back). Given that the JVM does not have total control over the
operation of the memory system it may even prove difficult to detect a
problem when an abnormal exit (say a `kill -KILL` termination) is performed.

Risks and Assumptions
---------------------

This implementation allows for management of NVM as an off-heap resource via a
`ByteBuffer`. A related enhancement,
[JDK-8153111](https://bugs.openjdk.java.net/browse/JDK-8153111), is looking at
the use of NVM for heap data. It may also be necessary to consider use of NVM
to store JVM metadata. These different modes of NVM management may turn out to
be incompatible or, possibly, just inappropriate when used in in combination.

The proposed API can only deal with mapped regions up to 2GB. It may be
necessary to revise the proposed implementation so that it conforms to changes
proposed in [JDK-8180628](https://bugs.openjdk.java.net/browse/JDK-8180628) to
overcome this restriction.

The `ByteBuffer` API is mostly focused on position-relative (cursor) access which limits opportunities for concurrent updates to independent buffer regions. These require locking of the buffer during update as detailed in [JDK-5029431](https://bugs.openjdk.java.net/browse/JDK-5029431), which also implemented a remedy. The problem is mitigated to some degree by the provision of primitive value accessors which operate at an absolute index without reference to a cursor, permitting unlocked access; also by the option to use `ByteBuffer` slices and `MethodHandles` to perform concurrent puts/gets of primitive values.

Comments

Hi Mark, The JEP has now completed review (by Alan Bateman) and been endorsed (By Brian Goetz) so I have targeted it for JDK14. Review of the JEP and a complete implementation are included in this (long) thread: https://mail.openjdk.java.net/pipermail/core-libs-dev/2019-June/060581.html
04-07-2019
HI Mark, I have made the requested changes. I also posted an RFR for the CSR which included a request for Endorsement from the relevant leads. I'll re-target when there is progress on both those fronts.
31-05-2019
The JEP itself looks to be in reasonable shape, but it’s not yet been endorsed by a Group or Area lead [1] -- in order to move this forward, could you please obtain at least one endorsement? Also, I see that the related CSR is still in the draft state -- a JEP should be nearly finished before it’s proposed to target a specific release, so in case the CSR review raises issues I suggest that you wait until the CSR has at least reached the Provisional state. I’ve moved this JEP back to Candidate for now. A couple of minor suggestions: - You describe the “preliminary changes” as being to the “planned enhancements to the JDK runtime.” That’s certainly true, but more to the point is that they’re both changes to the Java SE API, so “... enhancements to the Java SE API” would be a more apt wording. - You mention the addition to the Unsafe class as a “Proposed Restricted Public JDK API” change. It’d better be described as a “Proposed Internal JDK API” change -- the class may be public but it’s in an unexported package, definitely not intended for external use. Please also make it clear that it’s the Unsafe class in the jdk.internal.misc package rather than the legacy one in the exported sun.misc package. [1] https://openjdk.java.net/projects/jdk/leads
30-05-2019
Hi Mark. Thanks for the edits which are most welcome. Also, for the added pith which I took the liberty of inserting in the title slot. I also made one small correction to the text. In the description of the new force method the end of the writeback range was described as `to` when it needs to be `from + length`. I am happy for this to proceed to Candidate.
28-03-2019
Nicely written! I've done a light copy-editing pass, mainly to fix up some links, formatting, and terminology. I also removed the dependencies section, which added little value, and clarified the summary so that it reads better on its own. Please let me know if these changes look okay to you and I'll move this to Candidate. May I suggest a pithier title? We generally don't use class names in JEP titles, and we also try to avoid generic, low-value words such as "support." Perhaps "Non-Volatile Mapped Byte Buffers"?
27-03-2019