JDK-8366845 : C2 SuperWord: wrong VectorCast after VectorReinterpret with swapped src/dst type
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 26
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • CPU: x86_64
  • Submitted: 2025-09-04
  • Updated: 2025-09-12
  • Resolved: 2025-09-05
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 26
26 b15Fixed
Related Reports
Causes :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
I have seen 3 manifestations of this bug:

1.
#  Internal Error (.../src/hotspot/cpu/x86/x86.ad:7640), pid=84140, tid=28419
#  assert(UseAVX > 2 && VM_Version::supports_avx512dq()) failed: required

2.
# Internal Error (.../src/hotspot/share/opto/vectornode.cpp:1601), pid=4022154, tid=4022168
# Error: assert(bt == T_FLOAT) failed

3. Wrong result
When the feature was available but we used the wrong CastVector

It seems that JDK-8346236 introduced reinterpret nodes to SuperWord:

  } else if (VectorNode::is_reinterpret_opcode(opc)) {
    assert(first->req() == 2 && req() == 2, "only one input expected");
    const TypeVect* vt = TypeVect::make(bt, vlen);
    vn = new VectorReinterpretNode(in1, vt, in1->bottom_type()->is_vect());

Sandly, the src and dst type are swapped. For JDK25 JDK-8346236 this had no bad effect yet, since we only cast between HF and short, which are both based on short.

But with JDK-8329077 we can now do reinterpret between I/F and between D/L. Here swapping has an effect, especially if it is followed by a cast:
The cast deterines its input type from the output type of the input node. If that was a reinterpret node with the wrong output type, we would get a cast with the wrong src type. We might do a double -> int cast instead of a long -> int cast. That leads to all sorts of issues.

The fuzzer test was only just recently added with JDK-8324751. It uses MemorySegment, where unaligned float/double access gets handled with long/int memory access and then reinterpret (eg MoveI2F). But I was able to find examples that just work with Float.intBitsToFloat etc.

-------------------------- ORIGINAL REPORT ----------------------------

Test: compiler/loopopts/superword/TestAliasingFuzzer.java#vanilla

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/System/Volumes/Data/mesos/work_dir/slaves/f7f8bd65-a387-4a2b-b519-702f2fefaf87-S168313/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/2e223971-59eb-4ff2-b885-5f2f25e941b2/runs/a784d7c0-ede9-4ff5-af8c-35fe679d2c2b/workspace/open/src/hotspot/cpu/x86/x86.ad:7640), pid=84140, tid=28419
#  assert(UseAVX > 2 && VM_Version::supports_avx512dq()) failed: required
#
# JRE version: Java(TM) SE Runtime Environment (26.0+14) (fastdebug build 26-ea+14-1424)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 26-ea+14-1424, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-amd64)
# Core dump will be written. Default location: core.84140
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

---------------  S U M M A R Y ------------

Command Line: -Djava.library.path=/System/Volumes/Data/mesos/work_dir/jib-master/install/jdk-26+14-1424/macosx-x64-debug.test/hotspot/jtreg/native -Xbootclasspath/a:. -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:MaxRAMPercentage=4.16667 -Dtest.boot.jdk=/System/Volumes/Data/mesos/work_dir/jib-master/install/jdk/24/36/bundles/macos-x64/jdk-24_macos-x64_bin.tar.gz/jdk-24.jdk/Contents/Home -Djava.io.tmpdir=/System/Volumes/Data/mesos/work_dir/slaves/f7f8bd65-a387-4a2b-b519-702f2fefaf87-S168299/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/ffeecce3-f54d-43db-867b-fee88f68d77e/runs/699d015b-1320-4cf5-8184-c78c7476666f/testoutput/test-support/jtreg_open_test_hotspot_jtreg_tier1_compiler_3/tmp -Dir.framework.server.port=58166 -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:CompilerDirectivesFile=test-vm-compile-commands-pid-84137.log -XX:CompilerDirectivesLimit=101 -XX:-OmitStackTraceInFastThrow -DShouldDoIRVerification=true -XX:-BackgroundCompilation -XX:CompileCommand=quiet compiler.lib.ir_framework.test.TestVM compiler.loopopts.superword.templated.AliasingFuzzer

Host: "Macmini8,1" x86_64 3200 MHz, 12 cores, 32G, Darwin 22.6.0, macOS 13.6.3 (22G436)
Time: Wed Sep  3 20:06:14 2025 GMT elapsed time: 26.874742 seconds (0d 0h 0m 26s)

---------------  T H R E A D  ---------------

Current thread (0x00007fafa481b210):  JavaThread "C2 CompilerThread0" daemon [_thread_in_native, id=28419, stack(0x000070000ee3e000,0x000070000ef3e000) (1024K)]


Current CompileTask:
C2:26874 1248    b  4       compiler.loopopts.superword.templated.AliasingFuzzer::test_989 (133 bytes)

Stack: [0x000070000ee3e000,0x000070000ef3e000],  sp=0x000070000ef3a870,  free space=1010k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.dylib+0x1423474]  VMError::report(outputStream*, bool)+0x1f14  (x86.ad:7640)
V  [libjvm.dylib+0x142742b]  VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void const*, void const*, char const*, int, unsigned long)+0x60b
V  [libjvm.dylib+0x72d188]  report_vm_error(char const*, int, char const*, char const*, ...)+0xd8
V  [libjvm.dylib+0xbd038]  vcastLtoX_evexNode::emit(C2_MacroAssembler*, PhaseRegAlloc*) const+0x7d8
V  [libjvm.dylib+0x10945e3]  PhaseOutput::scratch_emit_size(Node const*)+0x2a3
V  [libjvm.dylib+0x1086d25]  PhaseOutput::shorten_branches(unsigned int*)+0x445
V  [libjvm.dylib+0x10863ad]  PhaseOutput::Output()+0x7ad
V  [libjvm.dylib+0x677e38]  Compile::Code_Gen()+0x958
V  [libjvm.dylib+0x674b4a]  Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1d1a
V  [libjvm.dylib+0x52e5d0]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x200
V  [libjvm.dylib+0x6974b2]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0xc42
V  [libjvm.dylib+0x696447]  CompileBroker::compiler_thread_loop()+0x3f7
V  [libjvm.dylib+0xab4c98]  JavaThread::thread_main_inner()+0x1b8
V  [libjvm.dylib+0x136150c]  Thread::call_run()+0xbc
V  [libjvm.dylib+0x1075c57]  thread_native_entry(Thread*)+0x137
C  [libsystem_pthread.dylib+0x61d3]  _pthread_start+0x7d
C  [libsystem_pthread.dylib+0x1bd3]  thread_start+0xf
Lock stack of current Java thread (top to bottom):

Comments
Test compiler/loopopts/superword/TestReinterpretAndCast.java fails on Linux aarch64 when running on a system with Cavium (CPU implementer : 0x43) CPU .
12-09-2025

Changeset: e6fa8aae Branch: master Author: Emanuel Peter <epeter@openjdk.org> Date: 2025-09-05 08:46:56 +0000 URL: https://git.openjdk.org/jdk/commit/e6fa8aae6168ea5a8579cd0a38209ca71c32e704
05-09-2025

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/27100 Date: 2025-09-04 14:42:46 +0000
05-09-2025

I'm not 100% sure, but I really could not reproduce anything with just JDK-8346236. I think that is because it essencially only used S->S reinterpret, and that is really harmless if the input and output type gets messed up. That only becomes an issue now with JDK-8329077, where we mess up D/L and I/F, and that really does matter. So I think it makes sense to fix it as a regression of JDK-8329077, even if the code from JDK-8346236 had the incorrect logic (though harmless at the time).
04-09-2025

Here an even simpler example, that works directly with arrays (no MemorySegment): ./java -XX:UseAVX=2 -XX:CompileCommand=compileonly,Test3::test1 -Xbatch -XX:+TraceNewVectors -XX:+TraceSuperWord Test3.java for (int i = 0; i < 2_000; i++) { long v0 = a[i]; double v1 = Double.longBitsToDouble(v0); float v2 = (float)v1; int v3 = Float.floatToRawIntBits(v2); b[i] = v3; } The issue happens between the MoveL2D (Double.longBitsToDouble) and CastD2F.
04-09-2025

It seems that JDK-8346236 introduced reinterpret nodes to SuperWord: } else if (VectorNode::is_reinterpret_opcode(opc)) { assert(first->req() == 2 && req() == 2, "only one input expected"); const TypeVect* vt = TypeVect::make(bt, vlen); vn = new VectorReinterpretNode(in1, vt, in1->bottom_type()->is_vect()); Sadly, it takes the bottom type of the in1 as the dst_bt. Maybe that was right for the Float16 (only works on short/float16) changes, but it's not right any more for JDK-8329077, which works on D/L and I/F. In our case, we have: (rr) p vt->dump() vectory<D,4> (rr) p in1->bottom_type()->dump() vectory<J,4> We pass: - vt into src_vt: double - in1 into dst_vt: long So the types are exactly inverted! And so that means we get: (rr) p vn->dump() 3153 VectorReinterpret === _ 3152 [[ ]] #vectory<J,4> (rr) p vn->bottom_type()->dump() vectory<J,4> But now the bottom type, i.e. the output is LONG. Yikes. This might be a very simple patch: diff --git a/src/hotspot/share/opto/vtransform.cpp b/src/hotspot/share/opto/vtransform.cpp index af4cb345e14..2f77c1c2e37 100644 --- a/src/hotspot/share/opto/vtransform.cpp +++ b/src/hotspot/share/opto/vtransform.cpp @@ -813,7 +813,7 @@ VTransformApplyResult VTransformElementWiseVectorNode::apply(VTransformApplyStat } else if (VectorNode::is_reinterpret_opcode(opc)) { assert(first->req() == 2 && req() == 2, "only one input expected"); const TypeVect* vt = TypeVect::make(bt, vlen); - vn = new VectorReinterpretNode(in1, vt, in1->bottom_type()->is_vect()); + vn = new VectorReinterpretNode(in1, in1->bottom_type()->is_vect(), vt); } else if (VectorNode::can_use_RShiftI_instead_of_URShiftI(first, bt)) { opc = Op_RShiftI; vn = VectorNode::make(opc, in1, in2, vlen, bt);
04-09-2025

Turns out the VectorCastL2X is really not good here. It leads to WRONG RESULTS with AVX512, and to ASSERT with AVX1/2: ./java -XX:UseAVX=3 -XX:CompileCommand=compileonly,Test2::test1 -Xbatch -XX:+TraceNewVectors -XX:+TraceSuperWord Test2.java ... TraceNewVectors [AutoVectorization]: 2835 LoadVector === 2718 7 2701 [[ ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=12; mismatched #vectorz<J,8> (does not depend only on test, unknown control) TraceNewVectors [AutoVectorization]: 2836 VectorReinterpret === _ 2835 [[ ]] #vectorz<J,8> TraceNewVectors [AutoVectorization]: 2837 VectorCastL2X === _ 2836 [[ ]] #vectory<F,8> TraceNewVectors [AutoVectorization]: 2838 VectorReinterpret === _ 2837 [[ ]] #vectory<F,8> TraceNewVectors [AutoVectorization]: 2839 StoreVector === 2718 2726 2710 2838 [[ ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=14; mismatched Memory: @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=14; SuperWord::transform_loop: success Exception in thread "main" java.lang.RuntimeException: wrong value: 8.0 4.620693E18 at Test2.main(Test2.java:46) ./java -XX:UseAVX=2 -XX:CompileCommand=compileonly,Test2::test1 -Xbatch -XX:+TraceNewVectors -XX:+TraceSuperWord Test2.java TraceNewVectors [AutoVectorization]: 3152 LoadVector === 3041 7 3023 [[ ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=12; mismatched #vectory<J,4> (does not depend only on test, unknown control) TraceNewVectors [AutoVectorization]: 3153 VectorReinterpret === _ 3152 [[ ]] #vectory<J,4> TraceNewVectors [AutoVectorization]: 3154 VectorCastL2X === _ 3153 [[ ]] #vectorx<F,4> TraceNewVectors [AutoVectorization]: 3155 VectorReinterpret === _ 3154 [[ ]] #vectorx<F,4> TraceNewVectors [AutoVectorization]: 3156 StoreVector === 3041 3042 3027 3155 [[ ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=14; mismatched Memory: @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=14; SuperWord::transform_loop: success # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/empeter/Documents/oracle/jdk-fork2/open/src/hotspot/cpu/x86/x86.ad:7640), pid=3998903, tid=3998917 # assert(UseAVX > 2 && VM_Version::supports_avx512dq()) failed: required My current theory is that the output basic type off the first reinterpret is wrong: VectorReinterpret === _ 3152 [[ ]] #vectory<J,4> It says it outputs a long, but really it should output a double. Then, the cast probably gets confused, and assumes that the input is long (and not double as it should), and then creates a vector cast from long, and not a vector cast from double.
04-09-2025

Reduced it to a super simple reproducer: ./java -XX:UseAVX=2 -XX:CompileCommand=compileonly,Test1::test1 -Xbatch -XX:+TraceNewVectors -XX:+TraceSuperWord Test1.java TraceNewVectors [AutoVectorization]: 3209 LoadVector === 3040 3041 3022 [[ ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=12; mismatched #vectory<J,4> (does not depend only on test, unknown control) TraceNewVectors [AutoVectorization]: 3210 VectorReinterpret === _ 3209 [[ ]] #vectory<J,4> TraceNewVectors [AutoVectorization]: 3211 VectorCastL2X === _ 3210 [[ ]] #vectorx<F,4> TraceNewVectors [AutoVectorization]: 3212 VectorReinterpret === _ 3211 [[ ]] #vectorx<F,4> TraceNewVectors [AutoVectorization]: 3213 StoreVector === 3040 3041 3026 3212 [[ ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=12; mismatched Memory: @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=12; SuperWord::transform_loop: success # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/empeter/Documents/oracle/jdk-fork2/open/src/hotspot/cpu/x86/x86.ad:7640), pid=3997016, tid=3997030 # assert(UseAVX > 2 && VM_Version::supports_avx512dq()) failed: required
04-09-2025

I think it was JDK-8329077 that "revealed" the bug. Since then, we allow auto vectorization of MoveF2I and MoveL2D. In the attached test, we see that we have these packs: PackSet::print: 5 packs Pack: 0 0: 3033 LoadL === 3057 3062 3034 [[ 3032 ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=15; unaligned mismatched unsafe #long (does not depend only on test, unknown control) !orig=2864,921 !jvms: Unsafe::getLongUnaligned @ bci:5 (line 3597) ScopedMemoryAccess::getLongUnalignedInternal @ bci:15 (line 3861) ScopedMemoryAccess::getLongUnaligned @ bci:6 (line 3849) VarHandleSegmentAsDoubles::get @ bci:42 (line 60) VarHandleSegmentAsDoubles::get @ bci:10 (line 53) VarHandleGuards::guard_LJ_D @ bci:49 (line 292) AbstractMemorySegmentImpl::get @ bci:8 (line 778) Test_8366845::test_989 @ bci:56 (line 246) 1: 3028 LoadL === 3057 3029 3044 [[ 3027 ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=15; unaligned mismatched unsafe #long (does not depend only on test, unknown control) !orig=921 !jvms: Unsafe::getLongUnaligned @ bci:5 (line 3597) ScopedMemoryAccess::getLongUnalignedInternal @ bci:15 (line 3861) ScopedMemoryAccess::getLongUnaligned @ bci:6 (line 3849) VarHandleSegmentAsDoubles::get @ bci:42 (line 60) VarHandleSegmentAsDoubles::get @ bci:10 (line 53) VarHandleGuards::guard_LJ_D @ bci:49 (line 292) AbstractMemorySegmentImpl::get @ bci:8 (line 778) Test_8366845::test_989 @ bci:56 (line 246) 2: 2864 LoadL === 3057 3024 2865 [[ 2863 ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=15; unaligned mismatched unsafe #long (does not depend only on test, unknown control) !orig=921 !jvms: Unsafe::getLongUnaligned @ bci:5 (line 3597) ScopedMemoryAccess::getLongUnalignedInternal @ bci:15 (line 3861) ScopedMemoryAccess::getLongUnaligned @ bci:6 (line 3849) VarHandleSegmentAsDoubles::get @ bci:42 (line 60) VarHandleSegmentAsDoubles::get @ bci:10 (line 53) VarHandleGuards::guard_LJ_D @ bci:49 (line 292) AbstractMemorySegmentImpl::get @ bci:8 (line 778) Test_8366845::test_989 @ bci:56 (line 246) 3: 921 LoadL === 3057 2860 920 [[ 953 ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=15; unaligned mismatched unsafe #long (does not depend only on test, unknown control) !jvms: Unsafe::getLongUnaligned @ bci:5 (line 3597) ScopedMemoryAccess::getLongUnalignedInternal @ bci:15 (line 3861) ScopedMemoryAccess::getLongUnaligned @ bci:6 (line 3849) VarHandleSegmentAsDoubles::get @ bci:42 (line 60) VarHandleSegmentAsDoubles::get @ bci:10 (line 53) VarHandleGuards::guard_LJ_D @ bci:49 (line 292) AbstractMemorySegmentImpl::get @ bci:8 (line 778) Test_8366845::test_989 @ bci:56 (line 246) Pack: 1 0: 3032 MoveL2D === _ 3033 [[ 3031 ]] !orig=2863,953 !jvms: VarHandleSegmentAsDoubles::get @ bci:49 (line 64) VarHandleSegmentAsDoubles::get @ bci:10 (line 53) VarHandleGuards::guard_LJ_D @ bci:49 (line 292) AbstractMemorySegmentImpl::get @ bci:8 (line 778) Test_8366845::test_989 @ bci:56 (line 246) 1: 3027 MoveL2D === _ 3028 [[ 3026 ]] !orig=953 !jvms: VarHandleSegmentAsDoubles::get @ bci:49 (line 64) VarHandleSegmentAsDoubles::get @ bci:10 (line 53) VarHandleGuards::guard_LJ_D @ bci:49 (line 292) AbstractMemorySegmentImpl::get @ bci:8 (line 778) Test_8366845::test_989 @ bci:56 (line 246) 2: 2863 MoveL2D === _ 2864 [[ 2862 ]] !orig=953 !jvms: VarHandleSegmentAsDoubles::get @ bci:49 (line 64) VarHandleSegmentAsDoubles::get @ bci:10 (line 53) VarHandleGuards::guard_LJ_D @ bci:49 (line 292) AbstractMemorySegmentImpl::get @ bci:8 (line 778) Test_8366845::test_989 @ bci:56 (line 246) 3: 953 MoveL2D === _ 921 [[ 955 ]] !jvms: VarHandleSegmentAsDoubles::get @ bci:49 (line 64) VarHandleSegmentAsDoubles::get @ bci:10 (line 53) VarHandleGuards::guard_LJ_D @ bci:49 (line 292) AbstractMemorySegmentImpl::get @ bci:8 (line 778) Test_8366845::test_989 @ bci:56 (line 246) Pack: 2 0: 3031 ConvD2F === _ 3032 [[ 3030 ]] #float !orig=2862,955 !jvms: Test_8366845::test_989 @ bci:61 (line 246) 1: 3026 ConvD2F === _ 3027 [[ 3025 ]] #float !orig=955 !jvms: Test_8366845::test_989 @ bci:61 (line 246) 2: 2862 ConvD2F === _ 2863 [[ 2861 ]] #float !orig=955 !jvms: Test_8366845::test_989 @ bci:61 (line 246) 3: 955 ConvD2F === _ 953 [[ 1618 ]] #float !jvms: Test_8366845::test_989 @ bci:61 (line 246) Pack: 3 0: 3030 MoveF2I === _ 3031 [[ 3029 ]] !orig=2861,1618 !jvms: VarHandleSegmentAsFloats::set @ bci:39 (line 79) VarHandleSegmentAsFloats::set @ bci:12 (line 69) VarHandleGuards::guard_LJF_V @ bci:51 (line 641) AbstractMemorySegmentImpl::set @ bci:10 (line 760) Test_8366845::test_989 @ bci:109 (line 247) 1: 3025 MoveF2I === _ 3026 [[ 3024 ]] !orig=1618 !jvms: VarHandleSegmentAsFloats::set @ bci:39 (line 79) VarHandleSegmentAsFloats::set @ bci:12 (line 69) VarHandleGuards::guard_LJF_V @ bci:51 (line 641) AbstractMemorySegmentImpl::set @ bci:10 (line 760) Test_8366845::test_989 @ bci:109 (line 247) 2: 2861 MoveF2I === _ 2862 [[ 2860 ]] !orig=1618 !jvms: VarHandleSegmentAsFloats::set @ bci:39 (line 79) VarHandleSegmentAsFloats::set @ bci:12 (line 69) VarHandleGuards::guard_LJF_V @ bci:51 (line 641) AbstractMemorySegmentImpl::set @ bci:10 (line 760) Test_8366845::test_989 @ bci:109 (line 247) 3: 1618 MoveF2I === _ 955 [[ 1757 ]] !jvms: VarHandleSegmentAsFloats::set @ bci:39 (line 79) VarHandleSegmentAsFloats::set @ bci:12 (line 69) VarHandleGuards::guard_LJF_V @ bci:51 (line 641) AbstractMemorySegmentImpl::set @ bci:10 (line 760) Test_8366845::test_989 @ bci:109 (line 247) Pack: 4 0: 3029 StoreI === 3057 3062 3038 3030 [[ 3024 3028 ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=15; unaligned mismatched unsafe Memory: @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=15; !orig=2860,1757,2887 !jvms: Unsafe::putIntUnaligned @ bci:10 (line 3740) ScopedMemoryAccess::putIntUnalignedInternal @ bci:17 (line 3192) ScopedMemoryAccess::putIntUnaligned @ bci:8 (line 3180) VarHandleSegmentAsFloats::set @ bci:47 (line 76) VarHandleSegmentAsFloats::set @ bci:12 (line 69) VarHandleGuards::guard_LJF_V @ bci:51 (line 641) AbstractMemorySegmentImpl::set @ bci:10 (line 760) Test_8366845::test_989 @ bci:109 (line 247) 1: 3024 StoreI === 3057 3029 3048 3025 [[ 2860 2864 ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=15; unaligned mismatched unsafe Memory: @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=15; !orig=1757,2887 !jvms: Unsafe::putIntUnaligned @ bci:10 (line 3740) ScopedMemoryAccess::putIntUnalignedInternal @ bci:17 (line 3192) ScopedMemoryAccess::putIntUnaligned @ bci:8 (line 3180) VarHandleSegmentAsFloats::set @ bci:47 (line 76) VarHandleSegmentAsFloats::set @ bci:12 (line 69) VarHandleGuards::guard_LJF_V @ bci:51 (line 641) AbstractMemorySegmentImpl::set @ bci:10 (line 760) Test_8366845::test_989 @ bci:109 (line 247) 2: 2860 StoreI === 3057 3024 2869 2861 [[ 1757 921 ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=15; unaligned mismatched unsafe Memory: @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=15; !orig=1757,2887 !jvms: Unsafe::putIntUnaligned @ bci:10 (line 3740) ScopedMemoryAccess::putIntUnalignedInternal @ bci:17 (line 3192) ScopedMemoryAccess::putIntUnaligned @ bci:8 (line 3180) VarHandleSegmentAsFloats::set @ bci:47 (line 76) VarHandleSegmentAsFloats::set @ bci:12 (line 69) VarHandleGuards::guard_LJF_V @ bci:51 (line 641) AbstractMemorySegmentImpl::set @ bci:10 (line 760) Test_8366845::test_989 @ bci:109 (line 247) 3: 1757 StoreI === 3057 2860 1756 1618 [[ 3062 1784 2723 ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=15; unaligned mismatched unsafe Memory: @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=15; !orig=2887 !jvms: Unsafe::putIntUnaligned @ bci:10 (line 3740) ScopedMemoryAccess::putIntUnalignedInternal @ bci:17 (line 3192) ScopedMemoryAccess::putIntUnaligned @ bci:8 (line 3180) VarHandleSegmentAsFloats::set @ bci:47 (line 76) VarHandleSegmentAsFloats::set @ bci:12 (line 69) VarHandleGuards::guard_LJF_V @ bci:51 (line 641) AbstractMemorySegmentImpl::set @ bci:10 (line 760) Test_8366845::test_989 @ bci:109 (line 247) and produce these vectors: TraceNewVectors [AutoVectorization]: 3263 LoadVector === 3057 3062 3034 [[ ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=15; mismatched #vectory<J,4> (does not depend only on test, unknown control) TraceNewVectors [AutoVectorization]: 3264 VectorReinterpret === _ 3263 [[ ]] #vectory<J,4> TraceNewVectors [AutoVectorization]: 3265 VectorCastL2X === _ 3264 [[ ]] #vectorx<F,4> TraceNewVectors [AutoVectorization]: 3266 VectorReinterpret === _ 3265 [[ ]] #vectorx<F,4> TraceNewVectors [AutoVectorization]: 3267 StoreVector === 3057 3062 3038 3266 [[ ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=15; mismatched Memory: @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=15; But what is a little strange: we are using VectorCastL2X to vectorize ConvD2F. Looks wrong to me.
04-09-2025

Ok, I have a reproducer that runs. I think it did not reproduce before JDK-8329077, but now with it it does. I'm not saying JDK-8329077 is the culprit, but it introduced auto-vectorization of MoveD2F, and that is probably needed in this example. time /home/empeter/Documents/oracle/jtreg/bin/jtreg -va -s -jdk:/home/empeter/Documents/oracle/jdk-fork2/build/linux-x64-debug/jdk -javaoptions:"-XX:UseAVX=2" -J-Djavatest.maxOutputSize=10000000 /home/empeter/Documents/oracle/jdk-fork2/open/test/hotspot/jtreg/compiler/loopopts/superword/Test_8366845.java # Internal Error (/home/empeter/Documents/oracle/jdk-fork2/open/src/hotspot/cpu/x86/x86.ad:7640), pid=3878625, tid=3878640 # assert(UseAVX > 2 && VM_Version::supports_avx512dq()) failed: required How I extracted the test: - I saw that the test compiler.loopopts.superword.templated.AliasingFuzzer::test_989 crashed in compilation. - Downloaded the workspace, searched for AliasingFuzzer.java - Took the whole file, slapped a jtreg launcher part at the top, removed framework.addFlags line, and launched it with JTREG. - And of course I made sure to run it with -XX:UseAVX=2 because I have an avx512 machine
04-09-2025

We've only been seeing the failures since Semptember 3. It could be purely due to JDK-8324751, but that was already integrated August 27. It could also be due to JDK-8329077 which was integrated September 2.
04-09-2025

It seems the issue is that those only have AVX2, but they are trying to generate something that requires avx512 features. # Internal Error (.../src/hotspot/cpu/x86/x86.ad:7640), pid=84140, tid=28419 # assert(UseAVX > 2 && VM_Version::supports_avx512dq()) failed: required It comes from here: vcastLtoX_evexNode::emit We have this predicate: 7594 instruct vcastLtoX_evex(vec dst, vec src) %{ 7595 predicate(UseAVX > 2 || 7596 (Matcher::vector_element_basic_type(n) == T_INT || 7597 Matcher::vector_element_basic_type(n) == T_FLOAT || 7598 Matcher::vector_element_basic_type(n) == T_DOUBLE)); 7599 match(Set dst (VectorCastL2X src)); 7600 format %{ "vector_cast_l2x $dst,$src\t!" %} 7601 ins_encode %{ So that seems it would allow AVX2 for some types. The generated example has a double -> float cast, so the predicate would let that through. 7601 ins_encode %{ 7602 BasicType to_elem_bt = Matcher::vector_element_basic_type(this); 7603 int vlen = Matcher::vector_length_in_bytes(this, $src); 7604 int vlen_enc = vector_length_encoding(this, $src); 7605 switch (to_elem_bt) { we switch on to_elem_bt, which is float, so we end up here: 7639 case T_FLOAT: 7640 assert(UseAVX > 2 && VM_Version::supports_avx512dq(), "required"); 7641 __ evcvtqq2ps($dst$$XMMRegister, $src$$XMMRegister, vlen_enc); 7642 break; But here we suddenly require more than AVX2. That can't be good, right? Another thing that is strange here: we seem to be matching an VectorCastL2X. But the source type is double. In the fuzzer generated test, we do this: public static Object test_989(MemorySegment container_0, int invar0_0, MemorySegment container_1, int invar0_1, int ivLo, int ivHi) { for (int i = ivHi-1; i >= ivLo; i-=1) { float v = (float)container_0.get(ValueLayout.JAVA_DOUBLE_UNALIGNED, ...index...); container_1.set(ValueLayout.JAVA_FLOAT_UNALIGNED, ...index..., v); } return new Object[] { container_0, container_1 }; } BTW: the container is a float array: private static MemorySegment original_container0_989 = MemorySegment.ofArray(new float[36756]); Now we should also look at these: 1588 bool Matcher::match_rule_supported_vector(int opcode, int vlen, BasicType bt) { ... 1757 case Op_VectorCastL2X: 1758 if (is_integral_type(bt) && size_in_bits == 256 && UseAVX < 2) { 1759 return false; 1760 } else if (!is_integral_type(bt) && !VM_Version::supports_avx512dq()) { 1761 return false; 1762 } 1763 break; Matcher::match_rule_supported ... 1418 case Op_VectorCastL2X: 1419 case Op_VectorCastF2X: 1420 case Op_VectorCastD2X: 1421 case Op_VectorUCastB2X: 1422 case Op_VectorUCastS2X: 1423 case Op_VectorUCastI2X: 1424 case Op_VectorMaskCast: 1425 if (UseAVX < 1) { // enabled for AVX only 1426 return false; 1427 } 1428 break; But honestly, it already looks plain wrong that we are using a VectorCastL2X in a float->double cast scenario. I wonder how we got here.
04-09-2025

Seems a bit tricky to easily extract a standalone test from the code generated by TestAliasingFuzzer.java. Emanuel, could you please have a look?
04-09-2025

The test was recently added by JDK-8324751 and is randomized.
04-09-2025

Crashing on all x86_64 platforms
04-09-2025

And yet another case, using Float16 an MoveF2I: ./java -XX:CompileCommand=compileonly,Test5::test1 -Xbatch -XX:+TraceNewVectors -XX:+TraceSuperWord Test5.java TraceNewVectors [AutoVectorization]: 1050 LoadVector === 672 7 923 [[ ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<S,8> (does not depend only on test, unknown control) TraceNewVectors [AutoVectorization]: 1051 VectorCastHF2F === _ 1050 [[ ]] #vectory<F,8> TraceNewVectors [AutoVectorization]: 1052 VectorReinterpret === _ 1051 [[ ]] #vectory<F,8> TraceNewVectors [AutoVectorization]: 1053 VectorCastF2X === _ 1052 [[ ]] #vectorz<J,8> TraceNewVectors [AutoVectorization]: 1054 StoreVector === 948 949 927 1053 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=10; mismatched Memory: @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=10; SuperWord::transform_loop: success Exception in thread "main" java.lang.RuntimeException: wrong value: 2.0 2.8E-45 at Test5.main(Test5.java:45) for (int i = 0; i < 2_000; i++) { short v0 = a[i]; Float16 v1 = Float16.shortBitsToFloat16(v0); float v2 = v1.floatValue(); int v3 = Float.floatToRawIntBits(v2); long v4 = v3; b[i] = v4; }
04-09-2025

Thanks for looking into this, [~epeter]! [~galder] FYI
04-09-2025

ILW = Assert during C2 compilation, easy to reproduce but edge case, -XX:-UseSuperWord or disable compilation of affected method = HLM = P3
04-09-2025

I can get another assert this way, also caused by Float.intBitsToFloat from JDK-8329077. ./java -XX:CompileCommand=compileonly,Test4::test1 -Xbatch -XX:+TraceNewVectors -XX:+TraceSuperWord Test4.java TraceNewVectors [AutoVectorization]: 1407 LoadVector === 524 7 1204 [[ ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorz<I,16> TraceNewVectors [AutoVectorization]: 1408 VectorReinterpret === _ 1407 [[ ]] #vectorz<I,16> # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/empeter/Documents/oracle/jdk-fork2/open/src/hotspot/share/opto/vectornode.cpp:1601), pid=4022154, tid=4022168 # Error: assert(bt == T_FLOAT) failed All I needed was: 51 » for (int i = 0; i < 2_000; i++) { 52 » int v0 = a[i]; 53 » float v1 = Float.intBitsToFloat(v0); 54 » short v2 = Float.floatToFloat16(v1); 55 » b[i] = v2; 56 » }
04-09-2025