Bug ID: JDK-8350835 C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input

Type: Bug
Component: hotspot
Sub-Component: compiler
Affected Version: 20,21,24,25

Priority: P2
Status: Resolved
Resolution: Fixed

Submitted: 2025-02-27
Updated: 2025-03-17
Resolved: 2025-03-17

JDK 25
25 masterFixed

See the attached Test.java

Manifestation:
- assert in debug
- miscompilation / wrong result in product.

Code of the assert / vectorization of Float.float16ToFloat was added in JDK-8294588

Discovered by Template Framework (work in progress): JDK-8344942

----------------------------------- Debug:

java -Xbatch -XX:+TraceSuperWord -XX:CompileCommand=compileonly,Test::test Test.java

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/oracle-work/jdk-fork6/open/src/hotspot/share/opto/vectornode.cpp:1500), pid=2137199, tid=2137213
#  Error: assert(bt == T_SHORT) failed
#
# JRE version: Java(TM) SE Runtime Environment (25.0) (fastdebug build 25-internal-LTS-2025-02-24-0956096.emanuel...)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 25-internal-LTS-2025-02-24-0956096.emanuel..., mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x192c4a0]  VectorCastNode::opcode(int, BasicType, bool)+0x170
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /oracle-work/jdk-fork0/build/linux-x64-debug/jdk/bin/core.2137199)
#
# An error report file with more information is saved as:
# /oracle-work/jdk-fork0/build/linux-x64-debug/jdk/bin/hs_err_pid2137199.log
#
# Compiler replay data is saved as:
# /oracle-work/jdk-fork0/build/linux-x64-debug/jdk/bin/replay_pid2137199.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#
Aborted (core dumped)


---------------------------------------- Product:

java -Xbatch -XX:CompileCommand=compileonly,Test::test -XX:CompileCommand=printassembly,Test::test Test.java

We see that we vectorize, but with the wrong instructions:

  0x00007c0ae8bafc36:   vmovdqu 0x10(%r10,%r8,1),%xmm0
  0x00007c0ae8bafc3d:   vmovdqu 0x80(%r10,%r8,1),%xmm7
  0x00007c0ae8bafc47:   vmovdqu 0x20(%r10,%r8,1),%xmm1
  0x00007c0ae8bafc4e:   vmovdqu 0x30(%r10,%r8,1),%xmm2
  0x00007c0ae8bafc55:   vmovdqu 0x40(%r10,%r8,1),%xmm3
  0x00007c0ae8bafc5c:   vmovdqu 0x50(%r10,%r8,1),%xmm4
  0x00007c0ae8bafc63:   vmovdqu 0x60(%r10,%r8,1),%xmm5
  0x00007c0ae8bafc6a:   vmovdqu 0x70(%r10,%r8,1),%xmm6
  0x00007c0ae8bafc71:   vcvtph2ps %ymm0,%zmm0
  0x00007c0ae8bafc77:   vmovdqu32 %zmm0,0x10(%r9,%r8,4)
  0x00007c0ae8bafc82:   vcvtph2ps %ymm6,%zmm0
  0x00007c0ae8bafc88:   vcvtph2ps %ymm5,%zmm5
  0x00007c0ae8bafc8e:   vcvtph2ps %ymm4,%zmm4
  0x00007c0ae8bafc94:   vcvtph2ps %ymm3,%zmm3
  0x00007c0ae8bafc9a:   vcvtph2ps %ymm2,%zmm2
  0x00007c0ae8bafca0:   vcvtph2ps %ymm1,%zmm1
  0x00007c0ae8bafca6:   vmovdqu32 %zmm1,0x50(%r9,%r8,4)
  0x00007c0ae8bafcb1:   vmovdqu32 %zmm2,0x90(%r9,%r8,4)
  0x00007c0ae8bafcbc:   vmovdqu32 %zmm3,0xd0(%r9,%r8,4)
  0x00007c0ae8bafcc7:   vmovdqu32 %zmm4,0x110(%r9,%r8,4)    ;   {no_reloc}
  0x00007c0ae8bafcd2:   vmovdqu32 %zmm5,0x150(%r9,%r8,4)
  0x00007c0ae8bafcdd:   vmovdqu32 %zmm0,0x190(%r9,%r8,4)
  0x00007c0ae8bafce8:   vcvtph2ps %ymm7,%zmm0
  0x00007c0ae8bafcee:   vmovdqu32 %zmm0,0x1d0(%r9,%r8,4)


Exception in thread "main" java.lang.RuntimeException: wrong value: -0.02746582 4.7683716E-7
	at Test.main(Test.java:31)

Changeset: 3239919a Branch: master Author: Sandhya Viswanathan <sviswanathan@openjdk.org> Date: 2025-03-17 17:50:34 +0000 URL: https://git.openjdk.org/jdk/commit/3239919a5a5910922ea4cb6109f94a24c5f6b4f2
17-03-2025
A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/23939 Date: 2025-03-07 01:56:49 +0000
07-03-2025
[~sviswanathan] I would take the approach (1) with the "tiny" patch. This is the easiest to backport. Then as a follow-up RFE, we can consider either the approach (2), or even doing it in the auto-vectorizer if that generalizes to more platforms (not sure about that yet). [~sviswanathan][~thartmann] Does that sound reasonable?
03-03-2025
Thanks for quickly looking into this, Sandhya. [~epeter], what do you think?
03-03-2025
I looked into this. There are two simple ways to fix this in short term: 1) Do not auto vectorize when a input type other than short is presented: see attached file float16_fix_tiny.patch. 2) Let the auto vectorizer proceed and handle the byte (int) to short conversion during code gen: see attached file float16_fix_small.patch. Does any of these direction look good to you? There could be many more ways to fix this. Fixing it in the autovectorizer itself looked somewhat complicated for a short term solution.
01-03-2025
[~sviswanathan], [~jbhateja], could you please have a look at this? Thanks!
27-02-2025
ILW = Incorrect result of C2 compiled code, easy to reproduce but edge case, -XX:DisableIntrinsic=_float16ToFloat = HML = P2
27-02-2025

Causes :	JDK-8294588 - Auto vectorize half precision floating point conversion APIs
Relates :	JDK-8352093 - Vectorize ConvHF2F when using Float.float16ToFloat with byte or int array element