Summary
-------
Provide an initial iteration of an [incubator module], jdk.incubator.vector, to express vector computations
that reliably compile at runtime to optimal vector hardware instructions on supported CPU architectures
and thus achieve superior performance to equivalent scalar computations.
Problem
-------
Vector computations consist of a sequence of operations on vectors. A vector
comprises a (usually) fixed sequence of scalar values, where the scalar
values correspond to the number of hardware-defined vector lanes. A binary operation applied
to two vectors with the same number of lanes would, for each lane, apply the
equivalent scalar operation on the corresponding two scalar values from each
vector. This is commonly referred to as
[Single Instruction Multiple Data][SIMD] (SIMD).
[SIMD]:https://en.wikipedia.org/wiki/SIMD
Vector operations express a degree of parallelism that enables more work to be
performed in a single CPU cycle and thus can result in significant performance
gains. For example, given two vectors each covering a sequence of eight
integers (eight lanes), then the two vectors can be added together using a
single hardware instruction. The vector addition hardware instruction operates
on sixteen integers, performing eight integer additions, in the time it would
ordinarily take to operate on two integers, performing one integer addition.
HotSpot supports [auto-vectorization] where scalar operations are transformed into
superword operations, which are then mapped to vector hardware instructions.
The set of transformable scalar operations are limited and fragile to changes in
the code shape. Furthermore, only a subset of available vector hardware
instructions might be utilized limiting the performance of generated code.
[auto-vectorization]:http://cr.openjdk.java.net/~vlivanov/talks/2017_Vectorization_in_HotSpot_JVM.pdf
A developer wishing to write scalar operations that are reliably transformed
into superword operations needs to understand HotSpot's auto-vectorization
support and its limitations to achieve reliable and sustainable performance.
In some cases it may not be possible for the developer to write scalar
operations that are transformable. For example, HotSpot does not transform the
simple scalar operations for calculating the hash code of an array (see the
`Arrays.hashCode` method implementations in the JDK source code), nor can it
auto-vectorize code to lexicographically compare two arrays (which why an
intrinsic was added to perform lexicographical comparison, see
[JDK-8033148][JDK-8033148]).
[JDK-8033148]:https://bugs.openjdk.java.net/browse/JDK-8033148
Solution
--------
The Vector API aims to address these issues by providing a mechanism to write
complex vector algorithms in Java, using pre-existing support in HotSpot
for vectorization, but with a user model which makes vectorization far more
predictable and robust. Hand-coded vector loops can express high-performance
algorithms (such as vectorized `hashCode` or specialized array comparison)
which an auto-vectorizer may never optimize.
There are numerous domains where this explicitly vectorizing
API may be applicable such as machine learning, linear algebra, cryptography,
finance, and usages within the JDK itself.
Specification
-------------
The implementation of Vector API exports the following interfaces in the package `jdk.incubator.vector`, defined in module `jdk.incubator.vector`.
```
Interfaces
VectorOperators.Associative Binary associative lane-wise operations that are applicable to vector lane values of some or all lane types.
VectorOperators.Binary Binary lane-wise operations that are applicable to vector lane values of some or all lane types.
VectorOperators.Comparison Binary lane-wise comparisons that are applicable to vector lane values of all lane types.
VectorOperators.Conversion<E,���F> Conversion operations that are applicable to vector lane values of specific lane types.
VectorOperators.Operator Lane-wise operations that are applicable to vector lane values of some or all lane types.
VectorOperators.Ternary Ternary lane-wise operations that are applicable to vector lane values of some or all lane types.
VectorOperators.Unary Unary lane-wise operations that are applicable to vector lane values of some or all lane types.
VectorSpecies<E> Interface for managing all vectors of the same combination of element type (ETYPE) and shape.
Classes
ByteVector A specialized Vector representing an ordered immutable sequence of byte values.
DoubleVector A specialized Vector representing an ordered immutable sequence of double values.
FloatVector A specialized Vector representing an ordered immutable sequence of float values.
IntVector A specialized Vector representing an ordered immutable sequence of int values.
LongVector A specialized Vector representing an ordered immutable sequence of long values.
ShortVector A specialized Vector representing an ordered immutable sequence of short values.
Vector<E> A sequence of a fixed number of lanes, all of some fixed element type such as byte, long, or float.
VectorMask<E> A VectorMask represents an ordered immutable sequence of boolean values.
VectorOperators This class consists solely of static constants that describe lane-wise vector operations,
plus nested interfaces which classify them.
VectorShuffle<E> A VectorShuffle represents an ordered immutable sequence of int values called source indexes,
where each source index numerically selects a source lane from a Vector of a compatible vector species.
Enum
VectorShape A VectorShape selects a particular implementation of Vectors.
```
A vector is represented by the abstract class `Vector<E>`, where type variable E corresponds to the boxed type of scalar primitive integral or floating point element types covered by the vector.
`Vector<E>` declares a set of methods for common vector operations supported by all element types. To reduce the surface of the api, instead of defining methods for each supported operation,
the api defines methods for each category of operations (such as lanewise(), reduceLanes(), compare(), etc). The operation to be performed is specified with an operator parameter.
The supported operators are defined in `VectorOperators` class as static final instances of `VectorOperators.Operator` interface and its sub-interfaces. The sub-interfaces correspond
to the classification of operators into groups such as unary (e.g. negation), binary (e.g. addition), comparison (e.g. lessThan), etc. Having said that, some common operations (such as add(), or())
are provided their own named methods.
The package has specialized implementations of `Vector<E>` for each `E` in the set {Byte, Short, Int, Long, Float, Double}. These classes export operations specific to an element type such as such as bitwise operations (e.g. logical or) which are specific to integral sub-types and mathematical operations (e.g. transcendental functions like pow()) for floating point sub-types.
A Vector has an element type which is represented by the type variable `E` and a shape which defines the size, in bits. Enum `VectorShape` is the enum of shapes supported by the api.
The element type and shape together form a species represented by `VectorSpecies<E>`. Species play a role in creation and type conversion of vectors, masks and shuffles.
To support control flow relevant vector operations will optionally accept masks, represented by the public abstract class `VectorMask<E>`. Each element
in a mask, a boolean value or bit, corresponds to a vector lane. When a mask is an input to an operation it governs whether the operation
is applied to a particular lane; the operation is applied for a lane(s) if the mask bit for that lane is set (is true). Alternative behavior occurs if the
mask bit is not set (is false). Comparison operations produce masks, which can then be input to other operations to selectively disable the
operation on certain lanes and thereby emulate flow control.
A VectorShuffle represents an ordered immutable sequence of int values. A VectorShuffle can be used with a shuffle accepting vector operation to
control the rearrangement of lane elements of input vectors.
The javadoc for the package with the implementation as of July 18, 2019 is at http://cr.openjdk.java.net/~kkharbas/vector-api/CSR/javadoc.02/jdk.incubator.vector/jdk/incubator/vector/package-summary.html and also attached here.
More details can be found in the JEP issue - https://bugs.openjdk.java.net/browse/JDK-8201271