Bug ID: JDK-8370409 Incorrect computation in Float16 reduction loop

Type: Bug
Component: hotspot
Sub-Component: compiler
Affected Version: 25

Priority: P3
Status: Open
Resolution: Unresolved
OS: generic
CPU: x86_64

Submitted: 2025-10-22
Updated: 2025-10-27

JDK 26
26Unresolved

Following test throws AssertionError with JDK-25 while it was passing with JDK-24.

public class reduction_loop {

    public static short [] arr = { 15318, 15320, -1024, 15324, 15325, 15327, 15328, 15329 };

    public static int ITER = 10000;

    public static long GOLDEN = ADDReduceLong();

    static long ADDReduceLong() {
        short res = 0;
        for (int i = 0; i < 8; i++) {
            res = Float.floatToFloat16(Float.float16ToFloat(res) + Float.float16ToFloat(arr[i]));
        }
        return (long)res;
    }

    public static void main(String [] args) {
        long res =  0;
        for (int i = 0; i < ITER; i++) {
            res += ADDReduceLong();
        }

        if ((GOLDEN * ITER)  !=  res) {
            throw new AssertionError("Incorrect result, " + GOLDEN + " != " + res);
        }
        System.out.println("PASS");
    }
}


EMR>which java
/usr/lib/jvm/java-24-openjdk-amd64//bin/java
EMR>java -Xbatch -XX:-TieredCompilation --add-modules=jdk.incubator.vector -cp . reduction_loop
WARNING: Using incubator modules: jdk.incubator.vector
PASS
EMR>
EMR>export JAVA_HOME=/home/jatin_bhateja/softwares/jdk-25/
EMR>export PATH=$JAVA_HOME/bin:$PATH
EMR>which java
/home/jatin_bhateja/softwares/jdk-25//bin/java
EMR>
EMR>
EMR>java -Xbatch -XX:-TieredCompilation --add-modules=jdk.incubator.vector -cp . reduction_loop
WARNING: Using incubator modules: jdk.incubator.vector
Exception in thread "main" java.lang.AssertionError: Incorrect result, -1024 != 544587776
        at reduction_loop.main(reduction_loop.java:25)

Thanks for the details [~jbhateja]!
27-10-2025
A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/27977 Date: 2025-10-24 14:36:21 +0000
24-10-2025
Hi [~thartmann] This looks like a side-effect of Float16 scalar intrinsification. Current floatToFloat16 intrinsic implementation always sign-extends the 16-bit short result to a 32-bit value in anticipation of safe consumption by subsequent integral comparison operations. However, the safest way to compare two Float16 values is the Float16.compare/compareTo method, given that floating point comparisons can also be unordered. Both 64512 and -1024 are equivalent bit representations of the Float16 -Inf value, jshell> Float16.compare(Float16.shortBitsToFloat16((short)-1024), Float16.shortBitsToFlot16((short)64512)) $3 ==> 0 In the scalar intrinsic of Float16.add/sub/mul/div/min/max, we always return a boxed value, which is then operated upon by the subsequent Float16 APIs. While Float.floatToFloat16 intrinsic always returns a 'short' value, this is special in the sense that even though the carrier type is 'short' but it encodes an IEEE 754 half precision value, being a short carrier if they get exposed to integral operators, then as per JVM specification, short should be sign-extended. We are mixing two semantics here; integral comparisons look at raw bits, while those two raw bits may be NaN bit patterns and a Fp16.NaN == Fp16.NaN should be false. #include <stdio.h> int main() { int value = 64512; long res = -1; asm volatile ( "vmovd %1, %%xmm0 \n\t" "vfpclasssh $0x10, %%xmm0, %%k1 \n\t" "kmovq %%k1, %0 \n\t" : "=r"(res) : "r"(value) : "cc", "%xmm0", "%k1", "%rax" ); return printf("res = %ld\n", res); } EMR>./a.out res = 1 EMR>gcc -mavx512f test_fp16.c EMR>cat test_fp16.c #include <stdio.h> int main() { int value = -1024; long res = -1; asm volatile ( "vmovd %1, %%xmm0 \n\t" "vfpclasssh $0x10, %%xmm0, %%k1 \n\t" "kmovq %%k1, %0 \n\t" : "=r"(res) : "r"(value) : "cc", "%xmm0", "%k1", "%rax" ); return printf("res = %ld\n", res); } EMR>./a.out res = 1 We don't care to sign-extend the short value for Float16* intrinsics to save addition cycles for sign-extension, since any Float16 API returns a boxed value, but we do sign-extend for Float.floatToFloat16 intrinsic since the return value is 'short'. Given that our Float16 binary operations inference is based on generic pattern match and is agnostic to how that graph pallet got created, i.e., either through Float16.* APIs or by explicit Float.float16ToFloat/floatToFloat16 operations, hence it's safe to sign-extend the result in all cases.
24-10-2025
[~jbhateja] Do you know which change in JDK 25 introduced this? ILW = Wrong result of compiled code, single test on architectures with FP16 support (AVX512-FP16), disable _float16ToFloat intrinsic or compilation of affected method = HLM = P3
23-10-2025