JDK-8201271 : JEP 338: Vector API (Incubator)
  • Type: JEP
  • Component: hotspot
  • Sub-Component: compiler
  • Priority: P3
  • Status: Closed
  • Resolution: Delivered
  • Fix Versions: 16
  • Submitted: 2018-04-06
  • Updated: 2021-08-28
  • Resolved: 2021-01-25
Related Reports
Relates :  
Relates :  
Sub Tasks
JDK-8223047 :  
JDK-8223347 :  
JDK-8261369 :  
Description
Summary
-------

Provide an initial iteration of an [incubator module](https://openjdk.java.net/jeps/11),
`jdk.incubator.vector`, to express vector computations that reliably compile at runtime to
optimal vector hardware instructions on supported CPU architectures and thus achieve superior
performance to equivalent scalar computations.

[incubating module]: http://openjdk.java.net/jeps/11


Goals
-----

- *Clear and concise API:*
The API shall be capable of clearly and concisely expressing a wide range of
vector computations consisting of a sequence of vector operations often composed
within loops and possibly with control flow.  It should be possible to express a
computation that is generic to vector size (or the number of lanes per vector)
thus enabling such computations to be portable across hardware supporting
different vector sizes (as detailed in the next goal).

- *Platform agnostic:*
The API shall be architecture agnostic, enabling support for runtime
implementations on multiple CPU architectures that support vector hardware 
instructions.
As is usual in Java APIs, where platform optimization and portability
conflict, the bias will be to making the Vector API portable, even if some
platform-specific idioms cannot be directly expressed in portable code.
The next goal of x64 and AArch64 performance is representative of appropriate
performance goals on all platforms where Java is supported.
The [ARM Scalable Vector Extension][SVE] (SVE) is of special interest in this
regard to ensure the API can support this architecture, even though as of
writing there are no known production hardware implementations.

[SVE]:https://arxiv.org/pdf/1803.06185.pdf

- *Reliable runtime compilation and performance on x64 and AArch64 architectures:*
The Java runtime, specifically the HotSpot C2 compiler, shall compile, on
capable x64 architectures, a sequence of vector operations to a corresponding
sequence of vector hardware instructions, such as those supported by
[Streaming SIMD Extensions][SSE] (SSE) and [Advanced Vector Extensions][AVX]
(AVX) extensions, thereby generating efficient and performant code.
The programmer shall have confidence that the vector operations they express
will reliably map closely to associated hardware vector instructions.
The same shall also apply to capable ARM AArch64 architectures compiling to a 
sequence of vector hardware instructions supported by [Neon][NEON].

[SSE]:https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions
[AVX]:https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
[NEON]:https://en.wikipedia.org/wiki/ARM_architecture#Advanced_SIMD_(Neon)

- *Graceful degradation:*
If a vector computation cannot be fully expressed at runtime as a sequence of
hardware vector instructions, either because an architecture does not
support some of the required instructions or because another CPU architecture is
not supported, then the Vector API implementation shall degrade gracefully and
still function.  This may include issuing warnings to the developer if a vector
computation cannot be sufficiently compiled to vector hardware instructions.
On platforms without vectors, graceful degradation shall yield code
competitive with manually-unrolled loops, where the unroll factor is
the number of lanes in the selected vector.


Non-Goals
---------

- It is not a goal to enhance the auto-vectorization support in HotSpot.

- It is not a goal for HotSpot to support vector hardware instructions on CPU
architectures other than x64 and AArch64.  Such support is left for later JEPs.
However, it is important to state, as expressed
in the goals, that the API must not rule out such implementations.  Further,
work performed may naturally leverage and extend existing abstractions in
HotSpot for auto-vectorization vector support making such a task easier.

- It is not a goal to support the C1 compiler in this or future iterations.  We
expect the Graal compiler to be supported in future work.

- It is not a goal to support strict floating point calculations as defined by
the Java `strictfp` keyword.  The results of floating point operations performed
on floating point scalars may differ from equivalent floating point operations
performing on vectors of floating point scalars.  However, this goal does not
rule out options to express or control the desired precision or reproducibility
of floating point vector computations.


Motivation
----------

Vector computations consist of a sequence of operations on vectors.  A vector
comprises a (usually) fixed sequence of scalar values, where the scalar
values correspond to the number of hardware-defined vector lanes.   A binary operation applied
to two vectors with the same number of lanes would, for each lane, apply the
equivalent scalar operation on the corresponding two scalar values from each
vector.  This is commonly referred to as
[Single Instruction Multiple Data][SIMD] (SIMD).

[SIMD]:https://en.wikipedia.org/wiki/SIMD

Vector operations express a degree of parallelism that enables more work to be
performed in a single CPU cycle and thus can result in significant performance
gains.  For example, given two vectors each covering a sequence of eight
integers (eight lanes), then the two vectors can be added together using a
single hardware instruction.  The vector addition hardware instruction operates
on sixteen integers, performing eight integer additions, in the time it would
ordinarily take to operate on two integers, performing one integer addition.

HotSpot supports [auto-vectorization] where scalar operations are transformed into
superword operations, which are then mapped to vector hardware instructions.
The set of transformable scalar operations are limited and fragile to changes in
the code shape.  Furthermore, only a subset of available vector hardware
instructions might be utilized limiting the performance of generated code.

[auto-vectorization]:http://cr.openjdk.java.net/~vlivanov/talks/2017_Vectorization_in_HotSpot_JVM.pdf

A developer wishing to write scalar operations that are reliably transformed
into superword operations needs to understand HotSpot's auto-vectorization
support and its limitations to achieve reliable and sustainable performance.

In some cases it may not be possible for the developer to write scalar
operations that are transformable.  For example, HotSpot does not transform the
simple scalar operations for calculating the hash code of an array (see the
`Arrays::hashCode` method implementations in the JDK source code), nor can it
auto-vectorize code to lexicographically compare two arrays (which is why an
intrinsic was added to perform  lexicographical comparison, see
[8033148][JDK-8033148]).

[JDK-8033148]:https://bugs.openjdk.java.net/browse/JDK-8033148

The Vector API aims to address these issues by providing a mechanism to write
complex vector algorithms in Java, using pre-existing support in HotSpot
for vectorization, but with a user model which makes vectorization far more
predictable and robust.  Hand-coded vector loops can express high-performance
algorithms (such as vectorized `hashCode` or specialized array comparison)
which an auto-vectorizer may never optimize.
There are numerous domains where this explicitly vectorizing
API may be applicable such as machine learning, linear algebra, cryptography,
finance, and usages within the JDK itself.


Description
-----------

A vector will be represented by the abstract class `Vector<E>`.
The type variable `E` corresponds to the boxed type of scalar primitive integral
or floating point element types covered by the vector.  A vector also has a _shape_
which defines the size, in bits, of the vector. The shape of the vector will govern
how an instance of `Vector<E>` is mapped to a vector hardware register when vector
computations are compiled by the HotSpot C2 compiler (see later for a mapping from
instances to x64 vector registers).  The length of a vector (number of lanes or elements)
will be the vector size divided by the element size.

The set of element types (`E`) supported will be `Byte`, `Short`, `Integer`, `Long`,
`Float` and `Double` corresponding to the scalar primitive types `byte`,
`short`, `int`, `long`, `float` and `double`, respectively.

The set of shapes supported will correspond to vector sizes of 64, 128, 256, and 512 bits.
A shape corresponding to a size of 512 bits can pack `byte`s into 64 lanes or pack
`int`s into 16 lanes, and a vector of such a shape can operate on 64 `byte`s at
a time, or 16 `int`s at a time.

> _Note:_ We believe that these simple shapes are generic enough to be
  useful on all platforms supporting the Vector API.  However, as we
  experiment during the incubation of this JEP with future platforms, we may further
  modify the design of the shape parameter.  Such work is not in
  the early scope of this JEP, but these possibilities partly inform the
  present role of shapes in the Vector API.  See the Future Work section, below.

The combination of element type and shape determines the vector's species, represented by `VectorSpecies<E>`

An instance of `Vector<E>` is immutable and is a value-based type that
retains, by default, object identity invariants (see later for relaxation of
these invariants).

Operations on vectors can be classified as lane-wise and cross-lane.  Lane-wise operations
can be further classified as unary, binary, ternary, and comparison.  Cross-lane operations
can be classified as permutation, conversion, and reduction.  To reduce the surface of the API,
we will define collective methods for each class of operation which then take an operator as input.
The supported operators are instances of `Operator` class and are defined as static final
fields in the `VectorOperators` class.  Some common operations (e.g., add, mul), called full-service
operations, will have dedicated methods which can be used in place of the generic methods.

Certain operations on vectors, such lane-wise cast and reinterpret, can be said to be inherently _shape-changing_.
Having shape-changing operations in a vector computation could have unintended effects
on portability and performance.  For this reason, wherever applicable, the API will define an additional
shape-invariant flavor of such an operation.  Users are encouraged to write shape-invariant code using
the shape-invariant flavor of operations.  Additionally, shape-changing operations will be clearly called
out in the Javadoc.

`Vector<E>` declares a set of methods for common vector operations supported
by all element types.  To support operations specific to an element type there are six
abstract sub-classes of `Vector<E>`, one for each supported element type:
`ByteVector`, `ShortVector`, `IntVector`, `LongVector`,
`FloatVector`, and `DoubleVector`.  These sub-classes define additional operations
which are bound to the element type since the method signature refers to the element type
(or the equivalent array type), such as reduction operations (e.g., sum all
elements to a scalar value) or storing the vector elements to an array.
They also define additional full-service operations that are specific
to the integral sub-types such as bitwise operations (e.g., logical or),
and operations specific to the floating point types, such as mathematical
operations (e.g., transcendental functions such as pow()).

These classes are further extended by concrete sub-classes defined for different shapes (size) of Vectors.

The concrete sub-classes are non-public since there is no need to provide operations specific to the type
and shape.  This reduces the API surface to a sum of concerns rather than a product. As a result, instances
of concrete `Vector` classes cannot be constructed directly.  Instead,
instances are obtained via factories methods defined in the base `Vector<E>` class and its type-specific sub-classes.
These methods take as input the species of the desired vector instance. The factory methods
provide different ways to obtain vector instances, such as the vector
instance whose elements are initiated to default values (the zero vector), or
a vector from an array, in addition to providing the canonical support for
converting between vectors of different types or shapes (e.g., casting).

To support control flow, relevant vector operations will optionally accept masks
represented by the public abstract class `VectorMask<E>`.  Each element in a mask, a boolean
value or bit, corresponds to a vector lane.  When a mask is an input to an operation
it governs whether the operation is applied to each lane; the operation is applied if the
mask bit for the lane is set (is true). Alternative behavior occurs if the mask bit
is not set (is false).  Similar to vectors, instances of `VectorMask<E>` are instances of (private) concrete sub-class defined for
each element type and length combination. The instance of `VectorMask<E>` used in an operation should have the same
type and length as the instance(s) of `Vector<E>` involved in the operation. Comparison operations
produce masks, which can then be input to other operations to selectively disable the
operation on certain lanes and thereby emulate flow control.  Another way for creating masks
is using static factory methods in `VectorMask<E>`.

We anticipate that masks will likely play an important role in the
development of vector computations that are generic to shape.  (This expectation is based
on the central importance of predicate registers, the equivalent of masks, in
the ARM Scalable Vector Extensions as well as in Intel's AVX-512.)

### Example

Here is a simple scalar computation over elements of arrays:

    void scalarComputation(float[] a, float[] b, float[] c) {
       for (int i = 0; i < a.length; i++) {
            c[i] = (a[i] * a[i] + b[i] * b[i]) * -1.0f;
       }
    }

(We assume that the array arguments will be of the same size.)

An explicit way to implement the equivalent vector computation using the Vector
API is as follows:

    // Example 1

    static final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_256;

    void vectorComputation(float[] a, float[] b, float[] c) {

        for (int i = 0; i < a.length; i += SPECIES.length()) {
            var m = SPECIES.indexInRange(i, a.length);
			// FloatVector va, vb, vc;
            var va = FloatVector.fromArray(SPECIES, a, i, m);
            var vb = FloatVector.fromArray(SPECIES, b, i, m);
            var vc = va.mul(va).
                        add(vb.mul(vb)).
                        neg();
            vc.intoArray(c, i, m);
        }
    }

In this example, a species for 256-bit wide vectors of floats is obtained from `FloatVector`.
The species is stored in a `static final` field so the runtime compiler will treat the field's
value as a constant and therefore be able to better optimize the vector computation.

The vector computation features a main loop kernel iterating over the
arrays in strides of vector length (i.e., the species length).  The static method `fromArray()` loads `float` vectors of the given species from arrays `a` and `b` at the corresponding index. Then
the operations are performed, fluently, and finally the result is stored into
array `c`.

We use masks, generated by `indexInRange()`, to prevent reading/writing past the array length.
The first `floor(a.length / SPECIES.length())` iterations will have a mask
with all lanes set. Only the final iteration, if `a.length` is not a multiple of
`SPECIES.length()`, will have a mask with first `a.length % SPECIES.length()` lanes set.

Since a mask is used in all iterations, the above implementation may not achieve optimal
performance for large array lengths. The same computation can be implemented without masks as follows:

    // Example 2

    static final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_256;

    void vectorComputation(float[] a, float[] b, float[] c) {
        int i = 0;
        int upperBound = SPECIES.loopBound(a.length);
        for (; i < upperBound; i += SPECIES.length()) {
            // FloatVector va, vb, vc;
            var va = FloatVector.fromArray(SPECIES, a, i);
            var vb = FloatVector.fromArray(SPECIES, b, i);
            var vc = va.mul(va).
                        add(vb.mul(vb)).
                        neg();
            vc.intoArray(c, i);
        }

        for (; i < a.length; i++) {
            c[i] = (a[i] * a[i] + b[i] * b[i]) * -1.0f;
        }
    }

The *tail* elements, the length of which is smaller than the species length, are processed
using the scalar computation after the vector computation. Another way to process the tail
elements is to use a single masked vector computation.

When operating on large arrays, the implementation above achieves optimal performance.

For this second example, the HotSpot compiler should generate machine code
similar to the following on an Intel x64 processor supporting AVX:

      0.43%  / │  0x0000000113d43890: vmovdqu 0x10(%r8,%rbx,4),%ymm0
      7.38%  │ │  0x0000000113d43897: vmovdqu 0x10(%r10,%rbx,4),%ymm1
      8.70%  │ │  0x0000000113d4389e: vmulps %ymm0,%ymm0,%ymm0
      5.60%  │ │  0x0000000113d438a2: vmulps %ymm1,%ymm1,%ymm1
     13.16%  │ │  0x0000000113d438a6: vaddps %ymm0,%ymm1,%ymm0
     21.86%  │ │  0x0000000113d438aa: vxorps -0x7ad76b2(%rip),%ymm0,%ymm0
      7.66%  │ │  0x0000000113d438b2: vmovdqu %ymm0,0x10(%r9,%rbx,4)
     26.20%  │ │  0x0000000113d438b9: add    $0x8,%ebx
      6.44%  │ │  0x0000000113d438bc: cmp    %r11d,%ebx
             \ │  0x0000000113d438bf: jl     0x0000000113d43890

This is actual output from a JMH micro-benchmark for the example code under
test using a prototype of the Vector API and implementation (the
`vectorIntrinsics` branch of Project Panama's development repository).
This shows the hot areas of C2-generated machine code.  There is a clear
translation to vector registers and vector hardware instructions.  (Loop
unrolling was disabled to make the translation clearer, otherwise HotSpot should
be able to unroll using existing C2 loop optimization techniques.)  All Java
object allocations are elided.

It is an important goal to support more complex non-trivial vector computations
that translate clearly into generated machine code.

There are, however, a few issues with this particular vector computation:

1. The loop is hardcoded to a concrete vector shape, so the computation cannot
adapt dynamically to a maximal shape supported by the architecture, which may be
smaller or larger than 256 bits.  Therefore the code is less portable and may be
less performant.

2. Calculation of the loop upper bounds, although simple here, can be a common 
source of programming error.

3. A scalar loop is required at the end, duplicating code.

We will address the first two issues in this JEP.  A preferred species can be
obtained whose shape is optimal for the current architecture, the vector
computation can then be written with a generic shape, and a method on the
species can round down the array length, for example:

    static final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_PREFERRED;

    void vectorComputation(float[] a, float[] b, float[] c,
            VectorSpecies<Float> species) {
        int i = 0;
        int upperBound = species.loopBound(a.length);
        for (; i < upperBound; i += species.length()) {
            //FloatVector va, vb, vc;
            var va = FloatVector.fromArray(species, a, i);
            var vb = FloatVector.fromArray(species, b, i);
            var vc = va.mul(va).
                        add(vb.mul(vb)).
                        neg();
            vc.intoArray(c, i);
        }

        for (; i < a.length; i++) {
            c[i] = (a[i] * a[i] + b[i] * b[i]) * -1.0f;
        }
    }

    vectorComputation(a, b, c, SPECIES);

The third issue will not be fully addressed by this JEP and will be the subject
of future work.  As shown in the first example, you can use masks
to implement vector computation without tail processing.  We anticipate that such masked
loops will work well for a range of architectures, including x64 and ARM, but
will require additional runtime compiler support to generate maximally efficient
code. Such work on masked loops, though important, is beyond the scope of this
JEP.

### HotSpot C2 compiler details

The Vector API has two implementations in order to achieve this JEP's goals.
The first implements operations in Java, thus it is functional but not optimal.
The second makes intrinsic, for the HotSpot C2 compiler, those operations with
special treatment for Vector API types.  This allows for proper translation to
hardware registers and instructions for the case where architecture support and
implementation for translation exists.

To avoid an explosion of intrinsics added to C2, a set of intrinsics will be
defined that correspond to operation kinds such as binary, unary, comparison,
and so on, where constant arguments are passed describing operation specifics.
Approximately twenty new intrinsics will be needed to support the intrinsification of
all parts of the API.

`Vector` instances are value-based, i.e., morally values where identity-sensitive
operations should be avoided.  Further, although vector instances are 
abstractly composed of elements in lanes, those elements are not scalarized 
by C2.  The vector value is treated as a whole unit, like `int` or `long`, that 
maps to a hardware vector register of the appropriate size. 
Inline types will require some related enhancements to ensure that a vector value 
is treat as a whole unit. 

Until inline types are available, `Vector` instances will be treated specially by
C2 to overcome limitations in escape analysis and avoid boxing.  As such, identity
sensitive operations on vectors should be avoided.

[Valhalla]:http://openjdk.java.net/projects/valhalla/

### Future Work

The Vector API will benefit significantly from value types once ready (see
[Project Valhalla][Valhalla]).  Instances of a `Vector<E>` can be values, 
whose concrete classes are inline types.  This will make it easier to optimize 
and express vector computations.  Sub-types of `Vector<E>` for specific types, 
such as `IntVector`, may not be required with generic specialization over inline 
types and type-specific method declaration.

Therefore, a future version of the Vector API will make use of inline types and enhanced 
generics, as noted above.  As a result, we will incubate the API over multiple releases of 
the JDK and will adapt as inlines types become available.

We will enhance the API to load and store vectors using features of 
[JEP 370 Foreign-Memory Access API][JEP 370], when that API transitions from an 
incubating API.  Further, memory layouts to describe vector species may prove 
useful, for example to stride over a memory segment comprised of elements.

[JEP 370]:https://openjdk.java.net/jeps/370

We anticipate enhancing the implementation in the following ways:

- Include support for vectorized transcendental operations (such as logarithm, and the 
trigonometric functions),

- Improve the optimization of loops containing vectorized code,

- Optimize masked vector operations on supporting platforms, and

- Make adjustments for large vector sizes (e.g., as supported by ARM SVE).

Performance work will be ongoing as we make incremental improvements to the
implementation.

Alternatives
------------

HotSpot's auto-vectorization is an alternative approach, but it would require
significant work.  It would, moreover, likely still be fragile and limited compared
to using the Vector API, since auto-vectorization with complex control flow is
very hard to perform.

In general, and even after decades of research (especially for FORTRAN and C
array loops), it seems that auto-vectorization of scalar code is not a reliable
tactic for optimizing ad-hoc user-written loops unless the user pays unusually
careful attention to unwritten contracts about exactly which loops a compiler is
prepared to auto-vectorize.  It's too easy to write a loop that fails to
auto-vectorize, for a reason that the optimizer but no human reader can detect.
Years of work on auto-vectorization, even in HotSpot, have left
us with lots of optimization machinery that works only on special occasions.  We
want to enjoy the use of this machinery more often!


Testing
-------

We will develop combinatorial unit tests to ensure coverage for all 
operations, for all supported types and shapes, over various data sets.

We will also develop performance tests to ensure that performance goals are met and
vector computations map efficiently to vector hardware instructions.  This will
likely consist of JMH micro-benchmarks, but more realistic examples of useful
algorithms will also be required.
Such tests may initially reside in a project specific repository. Curation is likely
required before integration into the main repository given the proportion of tests
and how they are generated.

As a backup to performance tests, we may create white-box tests to
force the JIT to report to us that vector API source code did, in
fact, trigger vectorization.


Risks and Assumptions
---------------------

There is a risk that the API will be biased to the SIMD functionality supported
on x64 architectures but this is mitigated with support for AArch64.
This applies mainly to the explicitly fixed set of
supported shapes, which bias against coding algorithms in a shape-generic
fashion.  We consider the majority of other operations of the Vector API to bias
toward portable algorithms.  To mitigate that risk we will take other architectures
into account, specifically the ARM Scalar Vector Extension architecture
whose programming model adjusts dynamically to the singular fixed shape
supported by the hardware.  We welcome and encourage OpenJDK contributors working
on the ARM-specific areas of HotSpot to participate in this effort.

The Vector API uses box types (such as `Integer`) as proxies for primitive types
(such as `int`).  This decision is forced by the current limitations of Java
generics, which are hostile to primitive types.  When Project Vahalla
eventually introduces more capable generics the current decision will seem
awkward, and may need changing.  We assume that such changes will be possible
without excessive backward incompatibility.

Comments
[~mcimadamore] thanks. I fixed the typos, removed the able, and addressed most other comments, and will follow up on the remaining. > * "Clear and concise API" - for some reason, reading the paragraph makes me think that 'clear and concise' is > just part of what you want to say here - e.g. it seems like the API also wants to be high-level enough to be able to > be implemented on different architectures. I moved the goal of "Platform agnostic” to be directly below "Clear and concise API”. > * "enabling support for runtime implementations on more than one hardware vectorization that supports vector > hardware instructions" - finding hard to parse this Changed to: “ The API shall be architecture agnostic, enabling support for runtime implementations on multiple CPU architectures that support vector hardware instructions. “ > * "Continuing with example 2 presented above, the HotSpot compiler should generate machine code similar to the > following:" - maybe mention that the assembly refers to Intel x64? Appended: “… on an Intel x64 platform supporting AVX” > * "HotSpot C2 compiler implementation details" - not sure if this section belongs to the JEP - seems very > implementation specific; although I can understand if people would point out that in fact this is the very essence of > the JEP I think it's worth considering if we can move some of this to an appendix. > * I would make some reference to JEP 370; it seems like memory access API could be useful for loading contents > of a vector (e.g. from off-heap) and the layout API could be useful to e.g. define the layout of vector species Will do.
26-03-2020

Looks good - some comments: * "Clear and concise API" - for some reason, reading the paragraph makes me think that 'clear and concise' is just part of what you want to say here - e.g. it seems like the API also wants to be high-level enough to be able to be implemented on different architectures. * "enabling support for runtime implementations on more than one hardware vectorization that supports vector hardware instructions" - finding hard to parse this * "It is not a goal to upgrade java.util.Vector" :-) * "which why an intrinsic was added to perform lexicographical comparison" -> "which IS why an intrinsic was added to perform lexicographical comparison" * "The following table presents the concrete vector classes and their mapping to x64 registers:" - I think the main goal of this table is to provide a list of all the concrete classes - but is the register mapping really relevant? Also, since these classes will not even be exposed as part of the API, is listing all these classes really useful in the context of this JEP? * "Continuing with example 2 presented above, the HotSpot compiler should generate machine code similar to the following:" - maybe mention that the assembly refers to Intel x64? * "HotSpot C2 compiler implementation details" - not sure if this section belongs to the JEP - seems very implementation specific; although I can understand if people would point out that in fact this is the very essence of the JEP * "which loops a compiler is prepared to auto-vectorized" -> "which loops a compiler is prepared to auto-vectorize" * I would make some reference to JEP 370; it seems like memory access API could be useful for loading contents of a vector (e.g. from off-heap) and the layout API could be useful to e.g. define the layout of vector species
23-03-2020

[~yzhang] [~sviswanathan] i have updated the JEP description to include AArch64 NEON on the same footing as x64.
20-03-2020

Yes, I agree. Let us add AArch64 NEON to the JEP.
20-03-2020

[~yzhang] i think that is reasonable, if we get agreement please add yourself as a reviewer once updated. [~sviswanathan] WDYT?
20-03-2020

> Goals > Reliable runtime compilation and performance on x64 architectures > Non-Goals It is not a goal for HotSpot to support vector hardware > instructions on CPU architectures other than x64 The AArch64 NEON support has completed and the code is merged in panama/vectorIntrinsics and panama/vector-unstable with tests passed. Do you think AArch64 NEON is a goal of this JEP?
20-03-2020

Please find at least one endorser (a Group or Area lead), and preferably at least a couple more reviewers, before proposing to target this JEP.
19-03-2020

Can someone please provide me pointers to generating specdiff. I found an AdoptOpenJDK script, is that the recommended tool?
16-10-2019

FTR: Here's a recent (7-2019) spin of the javadoc for the API: http://cr.openjdk.java.net/~jrose/vectors/vector-unstable-doc (I will update or delete this comment as the reference changes.)
16-07-2019

Moved the updated description from sub-task here.
03-05-2019

I created a sub-task to work on updating the JEP to reflect new api. FTR : the latest javadoc is at http://cr.openjdk.java.net/~kkharbas/vector-api/javadoc/VectorApi-JavaDoc.05/jdk.incubator.vector/jdk/incubator/vector/package-summary.html
26-04-2019

Hi Mark, I have incorporated both pieces of feedback. I moved the content of the Dependencies section to as sub-section in Description and I titled it "Future Work". I did keep the Dependencies section as per JEP format but I explicitly noted there are no dependencies. I also moved the second and third paragraph out of summary and only left the first paragraph, which does convey the intent of the JEP. The second paragraph discussed meaning of Vectors which I moved to be first paragraph of Motivation section. And the third paragraph noted that this could be an incubating module over several releases which I moved to the Future Work section. Thank you for these initial comments. I remarked it as submitted since I believe I have addressed your comments.
08-06-2018

Initial comments: 1. Per the JEP template (http://openjdk.java.net/jeps/2), please condense the Summary section to at most one or two sentences. Much of the current text would probably be a better fit in the Description section. 2. The speculation about value types in the Dependences section is interesting and informative, but it doesn’t really belong there since it’s not something upon which this JEP itself depends. Consider moving it to a “Future work” subsection of the Description section.
08-06-2018