JDK-8350835 : C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 20,21,24,25
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2025-02-27
  • Updated: 2025-03-17
  • Resolved: 2025-03-17
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 25
25 masterFixed
Related Reports
Causes :  
Relates :  
Description
See the attached Test.java

Manifestation:
- assert in debug
- miscompilation / wrong result in product.

Code of the assert / vectorization of Float.float16ToFloat was added in JDK-8294588

Discovered by Template Framework (work in progress): JDK-8344942

----------------------------------- Debug:

java -Xbatch -XX:+TraceSuperWord -XX:CompileCommand=compileonly,Test::test Test.java

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/oracle-work/jdk-fork6/open/src/hotspot/share/opto/vectornode.cpp:1500), pid=2137199, tid=2137213
#  Error: assert(bt == T_SHORT) failed
#
# JRE version: Java(TM) SE Runtime Environment (25.0) (fastdebug build 25-internal-LTS-2025-02-24-0956096.emanuel...)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 25-internal-LTS-2025-02-24-0956096.emanuel..., mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x192c4a0]  VectorCastNode::opcode(int, BasicType, bool)+0x170
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /oracle-work/jdk-fork0/build/linux-x64-debug/jdk/bin/core.2137199)
#
# An error report file with more information is saved as:
# /oracle-work/jdk-fork0/build/linux-x64-debug/jdk/bin/hs_err_pid2137199.log
#
# Compiler replay data is saved as:
# /oracle-work/jdk-fork0/build/linux-x64-debug/jdk/bin/replay_pid2137199.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#
Aborted (core dumped)


---------------------------------------- Product:

java -Xbatch -XX:CompileCommand=compileonly,Test::test -XX:CompileCommand=printassembly,Test::test Test.java

We see that we vectorize, but with the wrong instructions:

  0x00007c0ae8bafc36:   vmovdqu 0x10(%r10,%r8,1),%xmm0
  0x00007c0ae8bafc3d:   vmovdqu 0x80(%r10,%r8,1),%xmm7
  0x00007c0ae8bafc47:   vmovdqu 0x20(%r10,%r8,1),%xmm1
  0x00007c0ae8bafc4e:   vmovdqu 0x30(%r10,%r8,1),%xmm2
  0x00007c0ae8bafc55:   vmovdqu 0x40(%r10,%r8,1),%xmm3
  0x00007c0ae8bafc5c:   vmovdqu 0x50(%r10,%r8,1),%xmm4
  0x00007c0ae8bafc63:   vmovdqu 0x60(%r10,%r8,1),%xmm5
  0x00007c0ae8bafc6a:   vmovdqu 0x70(%r10,%r8,1),%xmm6
  0x00007c0ae8bafc71:   vcvtph2ps %ymm0,%zmm0
  0x00007c0ae8bafc77:   vmovdqu32 %zmm0,0x10(%r9,%r8,4)
  0x00007c0ae8bafc82:   vcvtph2ps %ymm6,%zmm0
  0x00007c0ae8bafc88:   vcvtph2ps %ymm5,%zmm5
  0x00007c0ae8bafc8e:   vcvtph2ps %ymm4,%zmm4
  0x00007c0ae8bafc94:   vcvtph2ps %ymm3,%zmm3
  0x00007c0ae8bafc9a:   vcvtph2ps %ymm2,%zmm2
  0x00007c0ae8bafca0:   vcvtph2ps %ymm1,%zmm1
  0x00007c0ae8bafca6:   vmovdqu32 %zmm1,0x50(%r9,%r8,4)
  0x00007c0ae8bafcb1:   vmovdqu32 %zmm2,0x90(%r9,%r8,4)
  0x00007c0ae8bafcbc:   vmovdqu32 %zmm3,0xd0(%r9,%r8,4)
  0x00007c0ae8bafcc7:   vmovdqu32 %zmm4,0x110(%r9,%r8,4)    ;   {no_reloc}
  0x00007c0ae8bafcd2:   vmovdqu32 %zmm5,0x150(%r9,%r8,4)
  0x00007c0ae8bafcdd:   vmovdqu32 %zmm0,0x190(%r9,%r8,4)
  0x00007c0ae8bafce8:   vcvtph2ps %ymm7,%zmm0
  0x00007c0ae8bafcee:   vmovdqu32 %zmm0,0x1d0(%r9,%r8,4)


Exception in thread "main" java.lang.RuntimeException: wrong value: -0.02746582 4.7683716E-7
	at Test.main(Test.java:31)
Comments
Changeset: 3239919a Branch: master Author: Sandhya Viswanathan <sviswanathan@openjdk.org> Date: 2025-03-17 17:50:34 +0000 URL: https://git.openjdk.org/jdk/commit/3239919a5a5910922ea4cb6109f94a24c5f6b4f2
17-03-2025

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/23939 Date: 2025-03-07 01:56:49 +0000
07-03-2025

[~sviswanathan] I would take the approach (1) with the "tiny" patch. This is the easiest to backport. Then as a follow-up RFE, we can consider either the approach (2), or even doing it in the auto-vectorizer if that generalizes to more platforms (not sure about that yet). [~sviswanathan][~thartmann] Does that sound reasonable?
03-03-2025

Thanks for quickly looking into this, Sandhya. [~epeter], what do you think?
03-03-2025

I looked into this. There are two simple ways to fix this in short term: 1) Do not auto vectorize when a input type other than short is presented: see attached file float16_fix_tiny.patch. 2) Let the auto vectorizer proceed and handle the byte (int) to short conversion during code gen: see attached file float16_fix_small.patch. Does any of these direction look good to you? There could be many more ways to fix this. Fixing it in the autovectorizer itself looked somewhat complicated for a short term solution.
01-03-2025

[~sviswanathan], [~jbhateja], could you please have a look at this? Thanks!
27-02-2025

ILW = Incorrect result of C2 compiled code, easy to reproduce but edge case, -XX:DisableIntrinsic=_float16ToFloat = HML = P2
27-02-2025