Bug ID: JDK-8278920 [vectorapi] IR tests fail with -XX:+UseKNLSetting and 512-bit vectors loaded from byte[]

Type: Bug
Component: hotspot
Sub-Component: compiler
Affected Version: 18,19

Priority: P4
Status: Open
Resolution: Unresolved
CPU: x86

Submitted: 2021-12-16
Updated: 2022-03-28

JDK 19
19Unresolved

The addition of IR tests for vector casts in JDK-8259610 exposed a possible bug when tests are run under the flag -XX:+UseKNLSetting.

The IR tests for 512-bit vector sizes fail to compile under C2, for example loading an IntVector from byte[].

Analysis with -XX:+PrintIntrinsics shows:

 20891 1042    b        compiler.vectorapi.reshape.tests.TestVectorCast::testI512toB128 (15 bytes)
 ** Rejected vector op (LoadVector,byte,64) because architecture does not support it
 ** not supported: arity=0 op=load vlen=64*8 etype=int/8 ismask=no
                                     @ 31   jdk.internal.vm.vector.VectorSupport::load (38 bytes)   failed to inline (intrinsic)
 20894 1042    b        compiler.vectorapi.reshape.tests.TestVectorCast::testI512toB128 (15 bytes)   COMPILE SKIPPED: malformed control flow (retry at different tier)

It's as if we are passing the wrong BasicType to LibraryCallKit::arch_supports_vector.
This could make sense since AVX512BW supports 512-bit vectors for byte, and for -XX:+UseKNLSetting we are doing ~CPU_AVX512BW.

[~qamai] and [~jbhateja] can I assign this to one of you? I have my focus on a few other bugs for now.
24-03-2022
Regarding the failures, I don't fully understand why do we face such failures, the test filters in only the shapes that are supported by the platform to conduct IR matches. This means that we should not have TestVectorExpandShrink.testB128toB512 running on KNL in the first place. I believe that malforming the control flow in the events of intrinsification failures sounds like a bug. As Tobias pointed out, we should gracefully bailout from intrinsification instead.
21-03-2022
Emanuel, [~psandoz] KNL only supports AVX512F instructions, any operations over 512 bit sub-word type needs AVX512BW feature, maximum vector size computed by C2 for sub-word types on a non-AVX512BW platform is 256 bit. This causes intrinsification involving B512 species to fail here. Technically certain non-predicated operations like reinterpretations which do not modify the vector contents but meagerly presents a different view of values to its consumers can still be handled in absence of AVX512BW feature by doing following Ideal translations:- Vector<Byte> V1 = ByteVector.fromArray(ByteVector.SPECIES_128, arr, 0); Vector<Byte> V2 = V1.reinterpretShape(ByteVector.SPECIES_512, 0) .reinterpretShape(ByteVector.SPECIES_128, 0); =========> Vector<Byte> V1 = ByteVector.fromArray(ByteVector.SPECIES_128, arr, 0); Vector<Byte> V2 = V1.reinterpretShape(IntVector.SPECIES_512, 0) .reinterpretShape(IntVector.SPECIES_128, 0) .reinterpretAsBytes();
21-03-2022
I agree with Emanuel's assessment, these kind of C2 bailouts, i.e., malformed control flow, are rarely expected. If the platform does not support features that are required, we should gracefully handle that by detecting it during C2 compilation and bail out from intrinsification (or trapping) instead of creating a broken IR that leads to a bailout and marking the entire method as non C2 compilable.
09-03-2022
I am unsure whether it's a bug or a restriction induced by the KNL setting. [~jbhateja] WDYT?
08-03-2022
In discussion with [~chagedorn] and [~thartmann], we came to the conclusion that it would be great to have UseKNLSetting added to the whitelist, so that we can do IR verification with it. Further, it is problematic that we bailout here. Yes, it is gaceful handling the situation. However, the bailout makes C2 compilation impossible as soon as we run into these vector ops. This could have big performance impacts. Imagine that this vector op is in a small function, but then gets inlined into a larger function - it could unexpectedly prohibit C2 compilation for the whole larger function. One idea that [~thartmann] had was to use uncommon traps, so that we could at least compile, and only get out if really necessary. Of course it would be better to see if we can adapt the compilation such that it also compiles fine on KNL, and only using vector ops that are supported for that architecture. [~psandoz] and [~qamai] what do you think? How should we proceed? Are we sure that the bailout is not a bug?
08-03-2022
Experiment 2: In addition to the whitelisting, I now abort on bailout, let's see what we bailout for and if that is even valid in the first place. Use flag -XX:+AbortVMOnCompilationFailure This lets the VM crash on bailout, in test compiler/vectorapi/reshape/TestVectorReinterpret.java: # fatal error: Not compilable at tier 4: malformed control flow In the log I find these lines: Rejected vector op (StoreVector,byte,64) because architecture does not support it not supported: arity=1 op=store vlen=648 etype=long/8 ismask=no * Rejected vector op (ReplicateS,short,32) because architecture does not support it not supported: arity=0 op=broadcast vlen=32 etype=short ismask=1 bcast_mode=0 Rejected vector op (LoadVectorMasked,short,32) because architecture does not support it Rejected vector op (LoadVector,short,32) because architecture does not support it not supported: op=loadMasked vlen=32 etype=short using_byte_array=0 ** Rejected vector op (LoadVectorMasked,byte,16) because architecture does not support it
08-03-2022
Experiment 1: IR is not performed since UseKNLSetting is not whitelisted, and the tests simply pass. To do IR verification, I whitelisted UseKNLSetting in test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java Run compiler/vectorapi/reshape tests with -XX:+UseKNLSetting -XX:+PrintIntrinsics flags. Get 6 such failures from IR verification, in compiler/vectorapi/reshape/TestVectorReinterpret.java: 1) Method "public static void compiler.vectorapi.reshape.tests.TestVectorExpandShrink.testB128toB512(byte[],byte[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"(\\\\d+(\\\\s){2}(VectorReinterpret.)+(\\\\s){2}===.)", "1"}, applyIfNot={})" - counts: Graph contains wrong number of nodes: * Regex 1: (\\d+(\\s){2}(VectorReinterpret.)+(\\s){2}===.) - Failed comparison: [found] 0 = 1 [given] - No nodes matched!
08-03-2022
ILW = C2 bails out of compilation leading to test failures, only with UseKNLSetting on AVX512, use -XX:-UseKNLSetting = MLM = P4
17-12-2021
[~qamai] Thanks for the additional clarification. We may need to revisit this area in the near future, as I would like to replace such byte[]/ByteBuffer load/store methods with a methods accepting MemorySegment.
16-12-2021
This is actually expected behaviour, because to load an IntVector from a byte array, we currently first load a similar-shape byte vector, then reinterpret that to an IntVector. For the reason why we need to load the vector as a ByteVector in the first place, I'm not sure whether it is necessary or we can lift this restriction.
16-12-2021

Relates :	JDK-8259610 - VectorReshapeTests are not effective due to failing to intrinsify "VectorSupport.convert"
Relates :	JDK-8278623 - compiler/vectorapi/reshape/TestVectorCastAVX512.java after JDK-8259610