Bug ID: JDK-8338126 C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 24
24 b21Fixed

It seems that vcvtps2ph only is implemented for vector length 4, 8, 16 on x64. Not sure about aarch64 or other platforms.
https://www.felixcloutier.com/x86/vcvtps2ph

But in the example below, we see that we still vectorize a 2-element vector with Float.floatToFloat16 / VectorCastF2HF / vcvtps2ph. It looks like it just generates a 4-element vcvtps2ph, which then stores 8 bytes instead of the desired 4 bytes. The 4 lower bytes have the correct values, but the upper 4 bytes are all zero. The vcvtps2ph operation stores directly to memory, meaning it overwrites 4 bytes with zero - this produces the wrong results.

Reproduces the bug:
java -Xint Test2b.java

But not in interpreter or without SuperWord:
java -Xint Test2b.java
java -XX:-UseSuperWord Test2b.java

More info in this run:
java -XX:CompileCommand=printcompilation,Test2b::test -XX:CompileCommand=compileonly,Test2b::test -Xbatch -XX:+TraceNewVectors Test2b.java

Result:
Exception in thread "main" java.lang.RuntimeException: errors: 480
	at Test2b.main(Test2b.java:30)


I can reproduce these wrong results in these versions: JDK24-JDK21.

It looks like a regression of JDK-8289552, which was introduced in JDK20.
https://github.com/openjdk/jdk/commit/07946aa49c97c93bd11675a9b0b90d07c83f2a94
https://git.openjdk.org/jdk/pull/9781


You have to assess if this only applies to x64, or also to aarch64 or even risc_v. They all implement VectorCastF2HF.

Changeset: 153ad911 Branch: master Author: Sandhya Viswanathan <sviswanathan@openjdk.org> Date: 2024-10-21 14:58:43 +0000 URL: https://git.openjdk.org/jdk/commit/153ad911f9fa3389ab92a1acab44526e3f4be4a2
21-10-2024
A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/21480 Date: 2024-10-11 23:27:35 +0000
11-10-2024
Great, thank you!
10-10-2024
[~thartmann] We'll have a PR ready by the end of this month. Thank you.
09-10-2024
[~svkamath] any update on this? Thanks!
09-10-2024
Thanks Smita. Assigning to you.
15-08-2024
I am able to reproduce this issue using the following command line java -XX:CompileCommand=printcompilation,Test2b::test -XX:CompileCommand=compileonly,Test2b::test -Xbatch -XX:+TraceNewVectors Test2b.java. Will work on fixing this issue. Thanks for reporting it.
14-08-2024
Originally reported among other bugs with JDK-8337817
12-08-2024
Ok, I added the "x64" CPU tag - it is probably limited to the x64 implementation of VectorCastF2HF / vcvtps2ph.
12-08-2024
ILW = Same as JDK-8337817 = P3
12-08-2024
Hi, I just did a quick run on an aarch64 (Neon only) machine. I cannot reproduce this issue. The loop does not seem to be vectorized and it generates ldr, fcvt, strh sequence.
09-08-2024

Duplicate :	JDK-8337817 - Incorrect result computation in vector tests with MaxVectorSize=8
Relates :	JDK-8289552 - Make intrinsic conversions between bit representations of half precision values and floats