It seems that vcvtps2ph only is implemented for vector length 4, 8, 16 on x64. Not sure about aarch64 or other platforms.
https://www.felixcloutier.com/x86/vcvtps2ph
But in the example below, we see that we still vectorize a 2-element vector with Float.floatToFloat16 / VectorCastF2HF / vcvtps2ph. It looks like it just generates a 4-element vcvtps2ph, which then stores 8 bytes instead of the desired 4 bytes. The 4 lower bytes have the correct values, but the upper 4 bytes are all zero. The vcvtps2ph operation stores directly to memory, meaning it overwrites 4 bytes with zero - this produces the wrong results.
Reproduces the bug:
java -Xint Test2b.java
But not in interpreter or without SuperWord:
java -Xint Test2b.java
java -XX:-UseSuperWord Test2b.java
More info in this run:
java -XX:CompileCommand=printcompilation,Test2b::test -XX:CompileCommand=compileonly,Test2b::test -Xbatch -XX:+TraceNewVectors Test2b.java
Result:
Exception in thread "main" java.lang.RuntimeException: errors: 480
at Test2b.main(Test2b.java:30)
I can reproduce these wrong results in these versions: JDK24-JDK21.
It looks like a regression of JDK-8289552, which was introduced in JDK20.
https://github.com/openjdk/jdk/commit/07946aa49c97c93bd11675a9b0b90d07c83f2a94
https://git.openjdk.org/jdk/pull/9781
You have to assess if this only applies to x64, or also to aarch64 or even risc_v. They all implement VectorCastF2HF.