Bug ID: JDK-8361582 AArch64: Some ConH values cannot be replicated with SVE

JDK-8361582 : AArch64: Some ConH values cannot be replicated with SVE

Type: Bug
Component: hotspot
Sub-Component: compiler
Affected Version: 25

Priority: P3
Status: Resolved
Resolution: Fixed
CPU: aarch64

Submitted: 2025-07-08
Updated: 2025-09-18
Resolved: 2025-09-01

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 25	JDK 26
25.0.2Unresolved	26 b14Fixed

Related Reports

Causes :	JDK-8355585 - Aarch64: Add aarch64 backend for Float16 vector operations
Causes :	JDK-8352635 - Improve inferencing of Float16 operations with constant inputs
Duplicate :	JDK-8362594 - Aarch64: Fix JTREG test - hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java failure on 256-bit SVE machines
Duplicate :	JDK-8364391 - AArch64: TestFloat16VectorOperations.java crashed with invalid immediate

Description

Seeing this reliably on Graviton 3 instance and current mainline. Bisection points to JDK-8352635.

$ CONF=linux-aarch64-server-fastdebug make images test TEST=compiler/vectorization/TestFloat16VectorOperations.java

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/shipilev/shipilev-jdk/src/hotspot/cpu/aarch64/assembler_aarch64.hpp:3756), pid=6237, tid=6259
#  guarantee(false) failed: invalid immediate

Current CompileTask:
C2:1867  892 %  b  4       compiler.vectorization.TestFloat16VectorOperations::vectorDivConstantInputFloat16 @ 2 (40 bytes)

Stack: [0x0000ffff72c98000,0x0000ffff72e96000],  sp=0x0000ffff72e91390,  free space=2020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x47fe84]  Assembler::sve_dup(FloatRegister, Assembler::SIMD_RegVariant, int)+0x124  (assembler_aarch64.hpp:3756)
V  [libjvm.so+0x1532570]  PhaseOutput::scratch_emit_size(Node const*)+0x2b0  (output.cpp:3387)
V  [libjvm.so+0x152b1d8]  PhaseOutput::shorten_branches(unsigned int*)+0x288  (output.cpp:540)
V  [libjvm.so+0x153ba70]  PhaseOutput::Output()+0xa24  (output.cpp:340)
V  [libjvm.so+0x9bd698]  Compile::Code_Gen()+0x518  (compile.cpp:3123)
V  [libjvm.so+0x9bfdec]  Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x17a4  (compile.cpp:892)
V  [libjvm.so+0x801eb8]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x174  (c2compiler.cpp:141)
V  [libjvm.so+0x9cd754]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0x9a8  (compileBroker.cpp:2327)
V  [libjvm.so+0x9ce27c]  CompileBroker::compiler_thread_loop()+0x570  (compileBroker.cpp:1971)
V  [libjvm.so+0xed1c8c]  JavaThread::thread_main_inner()+0xec  (javaThread.cpp:773)
V  [libjvm.so+0x19d61bc]  Thread::call_run()+0xb0  (thread.cpp:243)

Comments

Hi [~shade] I think it's not easy to backport this patch as this patch is dependent on this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e which I was under the impression that it went into JDK25 but it's targeted for JDK26 instead. Without this patch, the pattern of FP16 operations with one constant input will not be recognised in the mid-end and the backend replicateHF* match rules will not be exercised/generated at all. I don't think there will be any failures in JDK25 (especially the TestFloat16VectorOperations.java JTREG test) as I don't think there's a way to generate the replicateHF* nodes in the first place. Still there's wrong match rule implementation in JDK25. AS there's no easy way to test the changes made to the match rules, do you think it's better to delete the replicateHF* match rules in JDK25 instead? Or just keep them as is? Apologies for this confusion.
18-09-2025
[~shade]Thanks. I'll do that.
16-09-2025
No bugtail for 2 weeks since integration? I think it is time to bring it to 25u.
15-09-2025
Changeset: 7f0cd648 Branch: master Author: Bhavana Kilambi <bkilambi@openjdk.org> Committer: Aleksey Shipilev <shade@openjdk.org> Date: 2025-09-01 09:18:29 +0000 URL: https://git.openjdk.org/jdk/commit/7f0cd6488ba969d5cffe8ebe9b95e4ad70982188
01-09-2025
Yes, fixes pushed in JDK 25u will go into JDK 25.0.2 at this point.
20-08-2025
I believe 25.0.1 is already frozen. The current backports would go to 25.0.2. This is inconvenient, but not the end of the world. So, integrate the mainline patch, wait for it to accrue mainline testing, and then backport it somewhere in the beginning of September.
19-08-2025
[~thartmann] [~chagedorn] Hi, I would like to push this to JDK25u as well (as [~thartmann] suggested previously in the other duplicate bug I created for this issue) and can see that patches are already going into JDK25u with Oct 21st as deadline for the earliest update - JDK 25.0.1. Can you please advise until when can I push this patch exactly? I could not find any hard deadline until when patches are accepted for JDK 25.0.1. Thanks!
19-08-2025
A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/26589 Date: 2025-08-01 09:31:40 +0000
01-08-2025
Hi [~bkilambi], As you already have a fix, it makes sense to assign the ticket to it. Thank you.
21-07-2025
Hi [~eastigeevich] Spoke to Tobias on my ticket on the same issue and he says he's ok to backport this fix to JDK 25u. Thanks!
21-07-2025
Hi [~eastigeevich], I was already looking into this and have a fix as well (bug opened here - https://bugs.openjdk.org/browse/JDK-8362594). Let me know if you'd like to continue looking into this. I can drop my ticket. Could you please target both JDK25 and mainline for this fix please?
21-07-2025
Evgeny graciously accepted this task :)
14-07-2025
Thanks for the background. I will target it to JDK 26 for now. But if you are able to fix it within the RDP 1 time frame and want to backport it, feel free to re-target. ILW = Crash with guarantee due to invalid immediate, only seen on Graviton 3 but failing reliably there, disable compilation of affected method = HLM = P3
10-07-2025
I have not tried yet to write a direct test for it. But AFAICS, the problem is in replicateHF_imm rule that assumes any immH operand is encodeable, which does not look right from SVE specs. Therefore, I believe it is a problem in JDK-8355585, and JDK-8355235 just gives us a pathway to an inconvenient immediate. I'll try to build a local test...
09-07-2025
> it just becomes exposed by JDK-8355235 Do you mean JDK-8352635 instead as stated in the description? > I think this gap is present in original JDK-8355585 Were you able to also reproduce it with JDK 25 somehow?
09-07-2025
slowdebug fails here: Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x512184] Assembler::sve_dup(FloatRegister, Assembler::SIMD_RegVariant, int)+0x114 (assembler_aarch64.hpp:3756) V [libjvm.so+0x4da5e4] replicateHF_immNode::emit(C2_MacroAssembler, PhaseRegAlloc) const+0x174 (aarch64_vector.ad:4877) V [libjvm.so+0x13f500c] PhaseOutput::scratch_emit_size(Node const*)+0x3ac (output.cpp:3387) That code calls with mode H: https://github.com/openjdk/jdk/blob/1934bd8d2c02cdb1ba9caaef227ed073fb5e1a9d/src/hotspot/cpu/aarch64/aarch64_vector_ad.m4#L3098-L3113 // Replicate a 16-bit half precision float value instruct replicateHF_imm(vReg dst, immH con) %{ match(Set dst (Replicate con)); format %{ "replicateHF_imm $dst, $con\t# replicate immediate half-precision float" %} ins_encode %{ uint length_in_bytes = Matcher::vector_length_in_bytes(this); int imm = (int)($con$$constant) & 0xffff; if (VM_Version::use_neon_for_vector(length_in_bytes)) { __ mov($dst$$FloatRegister, get_arrangement(this), imm); } else { // length_in_bytes must be > 16 and SVE should be enabled assert(UseSVE > 0, "must be sve"); __ sve_dup($dst$$FloatRegister, __ H, imm); } %} ins_pipe(pipe_slow); %} Assert fails in imm: # guarantee(false) failed: invalid immediate: 25598 ...which I think does not satisfy the (imm & 0xff == 0) condition: https://github.com/openjdk/jdk/blob/1934bd8d2c02cdb1ba9caaef227ed073fb5e1a9d/src/hotspot/cpu/aarch64/assembler_aarch64.hpp#L3745-L3760 // SVE broadcast signed immediate to vector elements (unpredicated) void sve_dup(FloatRegister Zd, SIMD_RegVariant T, int imm8) { starti; assert(T != Q, "invalid size"); int sh = 0; if (imm8 <= 127 && imm8 >= -128) { sh = 0; } else if (T != B && imm8 <= 32512 && imm8 >= -32768 && (imm8 & 0xff) == 0) { sh = 1; imm8 = (imm8 >> 8); } else { guarantee(false, "invalid immediate"); } f(0b00100101, 31, 24), f(T, 23, 22), f(0b11100011, 21, 14); f(sh, 13), sf(imm8, 12, 5), rf(Zd, 0); }
08-07-2025
I think this gap is present in original JDK-8355585, it just becomes exposed by JDK-8355235 that started producing more interesting -- now unencodeable -- ConH-s.
08-07-2025
`sve_dup` looks matching the SVE spec pretty directly: "Unconditionally broadcast the signed integer immediate into each element of the destination vector. This instruction is unpredicated. The immediate operand is a signed value in the range -128 to +127, and for element widths of 16 bits or higher it may also be a signed multiple of 256 in the range -32768 to +32512 (excluding 0). The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is specified is "#<simm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8". https://developer.arm.com/documentation/ddi0596/2020-12/SVE-Instructions/DUP--immediate---Broadcast-signed-immediate-to-vector-elements--unpredicated-- So I think we should somehow disallow generating/matching code that goes into 16-bit immediates that cannot be encoded, not being the multiple of 256.
08-07-2025