JDK-8284972 : Integration of JEP 426: Vector API (Fourth Incubator)
  • Type: CSR
  • Component: hotspot
  • Sub-Component: compiler
  • Priority: P4
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 19
  • Submitted: 2022-04-18
  • Updated: 2022-05-03
  • Resolved: 2022-05-03
Related Reports
CSR :  
Description
Summary
-------

Specify API updates for JEP 426: Vector API (Fourth Incubator):

- load and store vectors to and from `MemorySegment`s.

- bitwise integral lanewise operations for counting bits, reversing bits and bytes, compressing bits, and expanding bits.

(Note: the crosslane operations for compressing and expanding lane elements was specified in the approved CSR [JDK-8277156](https://bugs.openjdk.java.net/browse/JDK-8277156). The specification associated with that CSR is included with attached documentation and specdiff of this CSR, since its convenient to retain rather than remove.)

Problem
-------

Accessing vectors to and from memory is limited to offsets of 2^31 -1 (namely non-negative `int` values). Further, it is not possible to reliably access vectors at offsets aligned to the vector size for superior performance.

The API is missing common "bit twiddling" functionality specified on the primitive boxed types.


Solution
--------

Add methods to load and store vectors to and from `MemorySegment`s. The size limitation is no longer an issue, since segment access accepts an offset as a `long` value. Segments may be allocated on hyper-aligned boundaries. This adds a preview dependency on [JEP 424](https://openjdk.java.net/jeps/424): Foreign Function & Memory API (Preview). 
 
Remove methods that load and store vectors to and from `byte[]` and `ByteBuffer`. Such methods are redundant, since a `MemorySegment` can be created from either. 

Add "bit twiddling" lanewise operators.
 
Specification
-------------

JavaDoc and specdiff may be found [here](http://cr.openjdk.java.net/~psandoz/panama/JDK-8284960-vector-api-jep-v4/), and is attached.

The following methods are added to support load and store vectors to and from `MemorySegment`s.

On `VectorSpecies`:

     /**
     * Loads a vector of this species from a {@linkplain MemorySegment memory segment}
     * starting at an offset into the memory segment.
     * Bytes are composed into primitive lane elements according
     * to the specified byte order.
     * The vector is arranged into lanes according to
     * <a href="Vector.html#lane-order">memory ordering</a>.
     * <p>
     * Equivalent to
     * {@code IntVector.fromMemorySegment(this,ms,offset,bo)},
     * on the vector type corresponding to
     * this species.
     *
     * @param ms the memory segment
     * @param offset the offset into the memory segment
     * @param bo the intended byte order
     * @return a vector of the given species filled from the memory segment
     * @throws IndexOutOfBoundsException
     *         if {@code offset+N*ESIZE < 0}
     *         or {@code offset+(N+1)*ESIZE > a.length}
     *         for any lane {@code N} in the vector
     * @see IntVector#fromMemorySegment(VectorSpecies, jdk.incubator.foreign.MemorySegment, long, java.nio.ByteOrder)
     * @see FloatVector#fromMemorySegment(VectorSpecies, jdk.incubator.foreign.MemorySegment, long, java.nio.ByteOrder)
     * @since 19
     */
    Vector<E> fromMemorySegment(MemorySegment ms, long offset, ByteOrder bo)

On `Vector`:

    /**
     * Stores this vector into a {@linkplain MemorySegment memory segment}
     * starting at an offset using explicit byte order.
     * <p>
     * Bytes are extracted from primitive lane elements according
     * to the specified byte ordering.
     * The lanes are stored according to their
     * <a href="Vector.html#lane-order">memory ordering</a>.
     * <p>
     * This method behaves as if it calls
     * {@link #intoMemorySegment(MemorySegment,long,ByteOrder,VectorMask)
     * intoMemorySegment()} as follows:
     * <pre>{@code
     * var m = maskAll(true);
     * intoMemorySegment(ms, offset, bo, m);
     * }</pre>
     *
     * @param ms the memory segment
     * @param offset the offset into the memory segment
     * @param bo the intended byte order
     * @throws IndexOutOfBoundsException
     *         if {@code offset+N*ESIZE < 0}
     *         or {@code offset+(N+1)*ESIZE > ms.byteSize()}
     *         for any lane {@code N} in the vector
     * @throws UnsupportedOperationException
     *         if the memory segment is read-only
     * @throws IllegalArgumentException if the memory segment is a heap segment that is
     *         not backed by a {@code byte[]} array.
     * @throws IllegalStateException if the memory segment's session is not alive,
     *         or if access occurs from a thread other than the thread owning the session.
     * @since 19
     */
    public abstract void intoMemorySegment(MemorySegment ms, long offset, ByteOrder bo)

    /**
     * Stores this vector into a {@linkplain MemorySegment memory segment}
     * starting at an offset using explicit byte order and a mask.
     * <p>
     * Bytes are extracted from primitive lane elements according
     * to the specified byte ordering.
     * The lanes are stored according to their
     * <a href="Vector.html#lane-order">memory ordering</a>.
     * <p>
     * The following pseudocode illustrates the behavior, where
     * {@code JAVA_E} is the layout of the primitive element type, {@code ETYPE} is the
     * primitive element type, and {@code EVector} is the primitive
     * vector type for this vector:
     * <pre>{@code
     * ETYPE[] a = this.toArray();
     * var slice = ms.asSlice(offset)
     * for (int n = 0; n < a.length; n++) {
     *     if (m.laneIsSet(n)) {
     *         slice.setAtIndex(ValueLayout.JAVA_E.withBitAlignment(8), n);
     *     }
     * }
     * }</pre>
     *
     * @implNote
     * This operation is likely to be more efficient if
     * the specified byte order is the same as
     * {@linkplain ByteOrder#nativeOrder()
     * the platform native order},
     * since this method will not need to reorder
     * the bytes of lane values.
     * In the special case where {@code ETYPE} is
     * {@code byte}, the byte order argument is
     * ignored.
     *
     * @param ms the memory segment
     * @param offset the offset into the memory segment
     * @param bo the intended byte order
     * @param m the mask controlling lane selection
     * @throws IndexOutOfBoundsException
     *         if {@code offset+N*ESIZE < 0}
     *         or {@code offset+(N+1)*ESIZE > ms.byteSize()}
     *         for any lane {@code N} in the vector
     *         where the mask is set
     * @throws UnsupportedOperationException
     *         if the memory segment is read-only
     * @throws IllegalArgumentException if the memory segment is a heap segment that is
     *         not backed by a {@code byte[]} array.
     * @throws IllegalStateException if the memory segment's session is not alive,
     *         or if access occurs from a thread other than the thread owning the session.
     * @since 19
     */
    public abstract void intoMemorySegment(MemorySegment ms, long offset,
                                           ByteOrder bo, VectorMask<E> m)

On `IntVector` and equivalent on all other primitive specializations:

    /**
     * Loads a vector from a {@linkplain MemorySegment memory segment}
     * starting at an offset into the memory segment.
     * Bytes are composed into primitive lane elements according
     * to the specified byte order.
     * The vector is arranged into lanes according to
     * <a href="Vector.html#lane-order">memory ordering</a>.
     * <p>
     * This method behaves as if it returns the result of calling
     * {@link #fromMemorySegment(VectorSpecies,MemorySegment,long,ByteOrder,VectorMask)
     * fromMemorySegment()} as follows:
     * <pre>{@code
     * var m = species.maskAll(true);
     * return fromMemorySegment(species, ms, offset, bo, m);
     * }</pre>
     *
     * @param species species of desired vector
     * @param ms the memory segment
     * @param offset the offset into the memory segment
     * @param bo the intended byte order
     * @return a vector loaded from the memory segment
     * @throws IndexOutOfBoundsException
     *         if {@code offset+N*4 < 0}
     *         or {@code offset+N*4 >= ms.byteSize()}
     *         for any lane {@code N} in the vector
     * @throws IllegalArgumentException if the memory segment is a heap segment that is
     *         not backed by a {@code byte[]} array.
     * @throws IllegalStateException if the memory segment's session is not alive,
     *         or if access occurs from a thread other than the thread owning the session.
     * @since 19
     */
    @ForceInline
    public static
    IntVector fromMemorySegment(VectorSpecies<Integer> species,
                                           MemorySegment ms, long offset,
                                           ByteOrder bo)

    /**
     * Loads a vector from a {@linkplain MemorySegment memory segment}
     * starting at an offset into the memory segment
     * and using a mask.
     * Lanes where the mask is unset are filled with the default
     * value of {@code int} (zero).
     * Bytes are composed into primitive lane elements according
     * to the specified byte order.
     * The vector is arranged into lanes according to
     * <a href="Vector.html#lane-order">memory ordering</a>.
     * <p>
     * The following pseudocode illustrates the behavior:
     * <pre>{@code
     * var slice = ms.asSlice(offset);
     * int[] ar = new int[species.length()];
     * for (int n = 0; n < ar.length; n++) {
     *     if (m.laneIsSet(n)) {
     *         ar[n] = slice.getAtIndex(ValuaLayout.JAVA_INT.withBitAlignment(8), n);
     *     }
     * }
     * IntVector r = IntVector.fromArray(species, ar, 0);
     * }</pre>
     * @implNote
     * This operation is likely to be more efficient if
     * the specified byte order is the same as
     * {@linkplain ByteOrder#nativeOrder()
     * the platform native order},
     * since this method will not need to reorder
     * the bytes of lane values.
     *
     * @param species species of desired vector
     * @param ms the memory segment
     * @param offset the offset into the memory segment
     * @param bo the intended byte order
     * @param m the mask controlling lane selection
     * @return a vector loaded from the memory segment
     * @throws IndexOutOfBoundsException
     *         if {@code offset+N*4 < 0}
     *         or {@code offset+N*4 >= ms.byteSize()}
     *         for any lane {@code N} in the vector
     *         where the mask is set
     * @throws IllegalArgumentException if the memory segment is a heap segment that is
     *         not backed by a {@code byte[]} array.
     * @throws IllegalStateException if the memory segment's session is not alive,
     *         or if access occurs from a thread other than the thread owning the session.
     * @since 19
     */
    @ForceInline
    public static
    IntVector fromMemorySegment(VectorSpecies<Integer> species,
                                           MemorySegment ms, long offset,
                                           ByteOrder bo,
                                           VectorMask<Integer> m)
 
The equivalent methods that load and store vectors to and from `byte[]` and `ByteBuffer` are removed.

Since the `MemorySegment` API uses `long` values, instead of `int` values, to represent offsets in a `MemorySegment` additional `long` accepting methods associated with bounds checks are added to complement the `int` accepting methods.  

On `VectorSpecies`:

    /**
     * Returns a mask of this species where only
     * the lanes at index N such that the adjusted index
     * {@code N+offset} is in the range {@code [0..limit-1]}
     * are set.
     *
     * <p>
     * This method returns the value of the expression
     * {@code maskAll(true).indexInRange(offset, limit)}
     *
     * @param offset the starting index
     * @param limit the upper-bound (exclusive) of index range
     * @return a mask with out-of-range lanes unset
     * @see VectorMask#indexInRange(long, long)
     * @since 19
     */
    VectorMask<E> indexInRange(long offset, long limit)

    /**
     * Loop control function which returns the largest multiple of
     * {@code VLENGTH} that is less than or equal to the given
     * {@code length} value.
     * Here, {@code VLENGTH} is the result of {@code this.length()},
     * and {@code length} is interpreted as a number of lanes.
     * The resulting value {@code R} satisfies this inequality:
     * <pre>{@code R <= length < R+VLENGTH}
     * </pre>
     * <p> Specifically, this method computes
     * {@code length - floorMod(length, VLENGTH)}, where
     * {@link Math#floorMod(long,int) floorMod} computes a remainder
     * value by rounding its quotient toward negative infinity.
     * As long as {@code VLENGTH} is a power of two, then the result
     * is also equal to {@code length & ~(VLENGTH - 1)}.
     *
     * @param length the input length
     * @return the largest multiple of the vector length not greater
     *         than the given length
     * @throws IllegalArgumentException if the {@code length} is
     *         negative and the result would overflow to a positive value
     * @see Math#floorMod(long, int)
     * @since 19
     */
    long loopBound(long length)

On `VectorMask`:

    /**
     * Removes lanes numbered {@code N} from this mask where the
     * adjusted index {@code N+offset}, is not in the range
     * {@code [0..limit-1]}.
     *
     * <p> In all cases the series of set and unset lanes is assigned
     * as if by using infinite precision or {@code VLENGTH-}saturating
     * additions or subtractions, without overflow or wrap-around.
     *
     * @apiNote
     *
     * This method performs a SIMD emulation of the check performed by
     * {@link Objects#checkIndex(long,long)}, on the index numbers in
     * the range {@code [offset..offset+VLENGTH-1]}.  If an exception
     * is desired, the resulting mask can be compared with the
     * original mask; if they are not equal, then at least one lane
     * was out of range, and exception processing can be performed.
     *
     * <p> A mask which is a series of {@code N} set lanes followed by
     * a series of unset lanes can be obtained by calling
     * {@code allTrue.indexInRange(0, N)}, where {@code allTrue} is a
     * mask of all true bits.  A mask of {@code N1} unset lanes
     * followed by {@code N2} set lanes can be obtained by calling
     * {@code allTrue.indexInRange(-N1, N2)}.
     *
     * @param offset the starting index
     * @param limit the upper-bound (exclusive) of index range
     * @return the original mask, with out-of-range lanes unset
     * @see VectorSpecies#indexInRange(long, long)
     * @since 19
     */
    public abstract VectorMask<E> indexInRange(long offset, long limit)

The following lanewise operators are added to `VectorOperators`:

    /** Produce {@code bitCount(a)} 
     * @since 19
     */
    public static final Unary BIT_COUNT

    /** Produce {@code compress(a,n)}. Integral, {@code int} and {@code long}, only.
     * @since 19
     */
    public static final /*bitwise*/ Binary COMPRESS_BITS

    /** Produce {@code expand(a,n)}. Integral, {@code int} and {@code long}, only.
     * @since 19
     */
    public static final /*bitwise*/ Binary EXPAND_BITS

    /** Produce {@code numberOfLeadingZeros(a)}
     * @since 19
     */
    public static final Unary LEADING_ZEROS_COUNT 

    /** Produce {@code reverse(a)}
     * @since 19
     */
    public static final Unary REVERSE

    /** Produce {@code reverseBytes(a)}
     * @since 19
     */
    public static final Unary REVERSE_BYTES

    /** Produce {@code numberOfTrailingZeros(a)}
     * @since 19
     */
    public static final Unary TRAILING_ZEROS_COUNT
Comments
Moving updated request to Approved.
03-05-2022

Drat, i forgot to add @since tags. I will work on that. Fixed it.
02-05-2022

Updated the solution section, mentioning the dependence on JEP 424. I will add @since tags when finalized and upload a zip of the doc and spec.
22-04-2022

Moving to Provisional. As a reminder, before the request is Finalized, please attached some stand-alone form of the spec changes to the CSR for archival purposes. I assume there is a dependency on JEP 424 as the MemorySegment type changes its module and package. Might be helpful to include @since tags for the new API to track progress of the on-going incubation.
22-04-2022