Bug ID: JDK-8180628 (bf) Retrofit direct buffer support for size beyond gigabyte scales

JDK-8180628 : (bf) Retrofit direct buffer support for size beyond gigabyte scales

Type: Enhancement
Component: core-libs
Sub-Component: java.nio
Affected Version: 10

Priority: P2
Status: Resolved
Resolution: Won't Fix

Submitted: 2017-05-18
Updated: 2020-02-07
Resolved: 2020-02-07

Related Reports

Duplicate :	JDK-4496703 - (bf) Buffer classes limited by 32-bit addressing
Relates :	JDK-8181704 - (bf) Support large scale scatter/gather of byte buffers
Relates :	JDK-8227394 - Add MemorySegment::asByteBuffer convenience method
Relates :	JDK-8234049 - Implementation of Memory Access API (Incubator)
Relates :	JDK-5029431 - (bf) Add absolute bulk put and get methods

Description

Direct buffers, like Java native arrays, are limited in size by the dynamic range of the Java `int` type.

Many users have a need for a Java handle to a native data block (especially a DirectByteBuffer) that may exceed 2Gb in size (the limit of what an `int` can index).

Future alternatives to DB's may include 1. a new replacement for DB that uses `long` instead of `int`, 2. retrofitting DB's to a specializable generic type whose index type is a type parameter that can assume both `int` and `long` (and, as a bonus, other types such as 2D mesh coordinates), and 3. relying on forthcoming Project Panama types like MemoryRegion.

Still, it is worth considering a retrofit of today's buffer types that forces them to process data under long indexes, even though this does some violence to the current API contract.

By using method overloading, we can define methods which take a `long` index wherever an `int` index or size is currently taken.  Return values cannot (alas) be lengthened by overloading tricks, but a convention can be used for new API points which deliver index or size values.

Here are suggested API points which would make a retrofit work:

public abstract static class MappedByteBuffer extends ByteBuffer {
  public final MappedByteBuffer position(long i); 
  public final MappedByteBuffer limit(long i); 
}
public abstract static class ByteBuffer extends Buffer implements Comparable<ByteBuffer> {
  public static ByteBuffer allocateDirect(long i); 
  public static ByteBuffer allocate(long i); 
  public abstract byte get(long i); 
  public abstract ByteBuffer put(long i, byte x); 
  public ByteBuffer position(long i); 
  public ByteBuffer limit(long i); 
  public final int alignmentOffset(long i, int j); 
  public final ByteBuffer alignedSlice(long i); 
  public abstract char getChar(long i); 
  public abstract ByteBuffer putChar(long i, char x); 
  public abstract short getShort(long i); 
  public abstract ByteBuffer putShort(long i, short x); 
  public abstract int getInt(long i); 
  public abstract ByteBuffer putInt(long i, int j); 
  public abstract long getLong(long i); 
  public abstract ByteBuffer putLong(long i, long x); 
  public abstract float getFloat(long i); 
  public abstract ByteBuffer putFloat(long i, float x); 
  public abstract double getDouble(long i); 
  public abstract ByteBuffer putDouble(long i, double x); 
}
public abstract static class Buffer {
  public final long capacityAsLong(); 
  public final long positionAsLong(); 
  public Buffer position(long i); 
  public final long limitAsLong(); 
  public Buffer limit(long i); 
  public final long remainingAsLong(); 
}

A buffer would have to record dynamically whether it was in `int` mode or `long` mode.  This could be simply a function of the buffer's size, or an explicit creation parameter.

If a buffer is in `long` mode, then calling one of the int mode query functions (instead of their `AsLong` siblings) would have to throw an error, at least if the value were outside of the dynamic range of an `int`.

This retrofit is similar to the one that allowed Unix file systems to work with files greater than 2Gb in size.  Many of the API points were unchanged, especially the streaming ones.  API points had to be joined by new API points with wider index or size types.

On-heap byte buffers could be given a similar treatment, by using a blocked-array data structure.  The design of this is problematic, since for some applications a single block size is workable but the block size may vary from application to application, and for still others a variable block size is necessary.

A better move would be to support scatter/gather directly, by allowing a third kind of byte buffer which logically clusters a group of subsidiary byte buffers.  This kind of byte buffer (GatheredByteBuffer or ByteBufferGroup) would probably need an internal indexing structure to quickly map absolute indexes down to the selected member of the group.  Important use cases are homogeneous power of two (easy to decode indexes) and the general (can use a binary search array).  This aspect is developed in more detail in JDK-8181704.

Comments

The intent of this issue is effectively addressed by the memory access API (JDK-8227446, JDK-8234049) and its ability to provide a ByteBuffer view of a MemorySegment (JDK-8227394). Therefore this issue is resolved as Won't Fix.
07-02-2020
Possible API additions for the "slice" or "view" approach: /** * Allocates a new direct region and returns a buffer representing a slice of that region. / static ByteBuffer allocateDirect(long regionCapacity, long sliceOffset, int sliceCapacity); /* * Allocates a new region and returns a buffer representing a slice of that region. / static ByteBuffer allocate(long regionCapacity, long sliceOffset, int sliceCapacity); /* * Creates a new region backed by the supplied buffers and returns a buffer representing * a slice of that region. The buffers are concatenated in the order supplied. / static ByteBuffer gather(long sliceOffset, int sliceCapacity, ByteBuffer�� buffers); /* * Returns a buffer representing a slice of the region to which this buffer belongs. / ByteBuffer slice(long sliceOffset, int sliceCapacity); /* * Returns whether this buffer represents a slice of a region. / boolean isSlice(); /* * Returns the capacity of the region to which this buffer belongs. / long regionCapacity(); /* * Returns the offset of this buffer within its containing region. / long sliceOffset(); /* * Absolute get() method. The index is over the range of the containing region. / byte get(long index); /* * Absolute bulk get() method. The index is over the range of the containing region. / ByteBuffer get(long index, byte[] dst, int offset, int length); /* * Absolute put() method. The index is over the range of the containing region. / ByteBuffer put(long index, byte b); /* * Absolute bulk put() method. The index is over the range of the containing region. */ ByteBuffer put(long index, byte[] src, int offset, int length);
29-08-2018
VarHandle support for buffers can be enhanced to access buffers with long index values. That combined with slicing at long offsets and enhancement of certain bulk operations may be sufficient as a minimal API.
23-04-2018
For really large BB's which model large segments of VM, it's probably worth turning off the mutability aspect, so that the BB can be safely shared across threads without either accidental or malicious races. There should be an operation on a BB which permanently disables anything mutable about it, either in the current object (BB::freeze) or in a fresh view (BB::asFrozenBuffer). This would lock out all non-absolute data accesses, changes to mark/pos/limit or attachments or any switches for byte order or the like. Users could get that mutable stuff back by slicing the frozen BB into a fresh mutable view. The frozen BB would not necessarily be read-only; the two kinds of freezing are independent (but can co-exist). (To be clear: I'm talking here about mutability of the dozen fields of the BB itself, not the mutability of the underlying storage that it refers to. The mutability of the addressed storage is a separate degree of freedom, controlled by the "read only view" feature already in the BB API.)
06-06-2017
Another approach to consider is an "aggregator" or "view" over a large memory region that allows access to slices as existing ByteBuffer instances. This may work well in the case when there is no sort-of-uniform-random-access over the whole memory region, but random access to some location after which the access is bound within a region to the existing size limits of ByteBuffer that is used to perform further accesses (perhaps contiguously in loops). The API impact would be lower, and would work with bulk array copy operations and primitive views. It might be possible express this in ByteBuffer itself, to obtain a ByteBuffer beyond the region it covers: class ByteBuffer { ... long regionCapacity(); ByteBuffer regionSlice(long pos, int lim); }
19-05-2017