JDK-8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 20,21,23,24
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • CPU: x86_64
  • Submitted: 2024-08-09
  • Updated: 2024-10-24
  • Resolved: 2024-10-21
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 24
24 b21Fixed
Related Reports
Duplicate :  
Relates :  
Description
It seems that vcvtps2ph only is implemented for vector length 4, 8, 16 on x64. Not sure about aarch64 or other platforms.
https://www.felixcloutier.com/x86/vcvtps2ph

But in the example below, we see that we still vectorize a 2-element vector with Float.floatToFloat16 / VectorCastF2HF / vcvtps2ph. It looks like it just generates a 4-element vcvtps2ph, which then stores 8 bytes instead of the desired 4 bytes. The 4 lower bytes have the correct values, but the upper 4 bytes are all zero. The vcvtps2ph operation stores directly to memory, meaning it overwrites 4 bytes with zero - this produces the wrong results.

Reproduces the bug:
java -Xint Test2b.java

But not in interpreter or without SuperWord:
java -Xint Test2b.java
java -XX:-UseSuperWord Test2b.java

More info in this run:
java -XX:CompileCommand=printcompilation,Test2b::test -XX:CompileCommand=compileonly,Test2b::test -Xbatch -XX:+TraceNewVectors Test2b.java

Result:
Exception in thread "main" java.lang.RuntimeException: errors: 480
	at Test2b.main(Test2b.java:30)


I can reproduce these wrong results in these versions: JDK24-JDK21.

It looks like a regression of JDK-8289552, which was introduced in JDK20.
https://github.com/openjdk/jdk/commit/07946aa49c97c93bd11675a9b0b90d07c83f2a94
https://git.openjdk.org/jdk/pull/9781


You have to assess if this only applies to x64, or also to aarch64 or even risc_v. They all implement VectorCastF2HF.
Comments
Changeset: 153ad911 Branch: master Author: Sandhya Viswanathan <sviswanathan@openjdk.org> Date: 2024-10-21 14:58:43 +0000 URL: https://git.openjdk.org/jdk/commit/153ad911f9fa3389ab92a1acab44526e3f4be4a2
21-10-2024

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/21480 Date: 2024-10-11 23:27:35 +0000
11-10-2024

Great, thank you!
10-10-2024

[~thartmann] We'll have a PR ready by the end of this month. Thank you.
09-10-2024

[~svkamath] any update on this? Thanks!
09-10-2024

Thanks Smita. Assigning to you.
15-08-2024

I am able to reproduce this issue using the following command line java -XX:CompileCommand=printcompilation,Test2b::test -XX:CompileCommand=compileonly,Test2b::test -Xbatch -XX:+TraceNewVectors Test2b.java. Will work on fixing this issue. Thanks for reporting it.
14-08-2024

Originally reported among other bugs with JDK-8337817
12-08-2024

Ok, I added the "x64" CPU tag - it is probably limited to the x64 implementation of VectorCastF2HF / vcvtps2ph.
12-08-2024

ILW = Same as JDK-8337817 = P3
12-08-2024

Hi, I just did a quick run on an aarch64 (Neon only) machine. I cannot reproduce this issue. The loop does not seem to be vectorized and it generates ldr, fcvt, strh sequence.
09-08-2024