When working directly with large-scale chunks of data, it is likely that scatter/gather data structures are important. There are two reasons. First, large-scale data is likely to profit from copy reduction, where the data is mapped from a file or other OS resource, but not copied into the Java heap. Second, the Java heap is likely to benefit from a size limit on data (currently 2Gb is the built-in limit). In both cases, it is sometimes desirable to have a data structure which resembles a byte-buffer, but is internally redirected to two or more contiguous segments of memory.
A third use case is the logical limit of the zero-copy scenario, where a large region of virtual memory hardware is comprised under a single object. In that case, the virtual addresses of the region are likely to appear as direct indexes into the composite object, which may refer internally to a patchwork of address mappings.
There may be a Java API that can efficiently cover these use cases. Likely requirements are:
- byte-indexed data structure
- supports byte-buffer style views onto small-scale slices
- supports looping over slices at all scales (stream-of-segment loops)
- no intrinsic limit on index size, up to 64 bits (for now)
- allows existing byte buffers to be assembled as groups
- maybe allows arbitrary ranges of long values as indexes (not zero based)
- supports paged (array-let or page table) style constant-time indexing
- supports irregular (binary search or skip-list) style composition with log-time indexing
(Note: Irregularly blocked large arrays can provide an efficient "bucket" for
collecting the output of large parallel streams in cases where the stream
output cannot be sized in advance. The simpler regularly blocked arrays are
a way to be friendly to the garbage collector and/or storage manager.)
See comments for a sketch.