JDK-8313345 : SuperWord fails due to CMove without matching Bool pack
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 21,22
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • CPU: generic
  • Submitted: 2023-07-28
  • Updated: 2023-08-24
  • Resolved: 2023-08-09
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 21 JDK 22
21Fixed 22 b10Fixed
Related Reports
Relates :  
Description
After JDK-8306302, I'm encountering crashes on certain aarch64 systems:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (../../src/hotspot/share/opto/superword.cpp:2788), pid=2695887, tid=2695905
#  assert(p_bol != nullptr) failed: CMove must have matching Bool pack
#
# JRE version: OpenJDK Runtime Environment (21.0.1) (slowdebug build 21.0.1-testing-builds.shipilev.net-openjdk-jdk21-b1-20230723)
# Java VM: OpenJDK 64-Bit Server VM (slowdebug 21.0.1-testing-builds.shipilev.net-openjdk-jdk21-b1-20230723, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
# Problematic frame:
# V  [libjvm.so+0x14dc6ac]  SuperWord::output()+0xea8
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /home/ubuntu/mc/debug_server/core.2695887)
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

---------------  S U M M A R Y ------------

Command Line: -Xmx8G -Xms4G -XX:PrintIdealGraphLevel=3 -XX:PrintIdealGraphFile=superword_crash.xml paper.jar --nogui

Host: sirywell-vps, AArch64, 4 cores, 23G, Ubuntu 20.04.6 LTS
Time: Wed Jul 26 19:30:34 2023 UTC elapsed time: 379.393956 seconds (0d 0h 6m 19s)

---------------  T H R E A D  ---------------

Current thread (0x0000ffff1849a7b0):  JavaThread "C2 CompilerThread1" daemon [_thread_in_native, id=2695905, stack(0x0000ffff49d92000,0x0000ffff49f90000) (2040K)]


Current CompileTask:
C2: 379394 12918       4       net.minecraft.world.level.levelgen.DensityFunctions$PureTransformer::fillArray (40 bytes)

Stack: [0x0000ffff49d92000,0x0000ffff49f90000],  sp=0x0000ffff49f8a530,  free space=2017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x14dc6ac]  SuperWord::output()+0xea8  (superword.cpp:2788)
V  [libjvm.so+0x14d427c]  SuperWord::SLP_extract()+0x358  (superword.cpp:667)
V  [libjvm.so+0x14d2934]  SuperWord::transform_loop(IdealLoopTree*, bool)+0x51c  (superword.cpp:178)
V  [libjvm.so+0x10ceae0]  PhaseIdealLoop::build_and_optimize()+0x1724  (loopnode.cpp:4661)
V  [libjvm.so+0x8eba44]  PhaseIdealLoop::PhaseIdealLoop(PhaseIterGVN&, LoopOptsMode)+0x148  (loopnode.hpp:1124)
V  [libjvm.so+0x8ebc40]  PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x28  (loopnode.hpp:1203)
V  [libjvm.so+0x8dc140]  Compile::optimize_loops(PhaseIterGVN&, LoopOptsMode)+0x8c  (compile.cpp:2156)
V  [libjvm.so+0x8dce94]  Compile::Optimize()+0xbb4  (compile.cpp:2386)
V  [libjvm.so+0x8d6348]  Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1138  (compile.cpp:842)
V  [libjvm.so+0x798278]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x14c  (c2compiler.cpp:118)
V  [libjvm.so+0x8f8388]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0x768  (compileBroker.cpp:2265)
V  [libjvm.so+0x8f71c8]  CompileBroker::compiler_thread_loop()+0x3c4  (compileBroker.cpp:1944)
V  [libjvm.so+0x91c968]  CompilerThread::thread_entry(JavaThread*, JavaThread*)+0xa4  (compilerThread.cpp:58)
V  [libjvm.so+0xd7bea8]  JavaThread::thread_main_inner()+0x174  (javaThread.cpp:719)
V  [libjvm.so+0xd7bd28]  JavaThread::run()+0x1e4  (javaThread.cpp:704)
V  [libjvm.so+0x1548cb4]  Thread::call_run()+0x1c4  (thread.cpp:217)
V  [libjvm.so+0x1258b84]  thread_native_entry(Thread*)+0x194  (os_linux.cpp:778)
C  [libpthread.so.0+0x7624]  start_thread+0x184

I was able to reproduce this on OCI Ampere A1 Compute instances as well as on a Raspberry Pi 4 model B. It does not crash on Apple M1.

While the crash above comes from a slowdebug build, normal builds (manually built from master at that point) fail with a SIGSEGV:



#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000ffff9847664c, pid=674132, tid=674405
#
# JRE version: OpenJDK Runtime Environment (22.0) (build 22-internal-adhoc.ubuntu.jdk)
# Java VM: OpenJDK 64-Bit Server VM (22-internal-adhoc.ubuntu.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
# Problematic frame:
# V  [libjvm.so+0xd5664c]  SuperWord::vector_opd(Node_List*, int)+0x24
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /home/ubuntu/mc/debug_server/core.674132)
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

---------------  S U M M A R Y ------------

Command Line: -Xmx8G -Xms4G -XX:+UseG1GC -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=200 -XX:+UnlockExperimentalVMOptions -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -XX:G1NewSizePercent=30 -XX:G1MaxNewSizePercent=40 -XX:G1HeapRegionSize=8M -XX:G1ReservePercent=20 -XX:G1HeapWastePercent=5 -XX:G1MixedGCCountTarget=4 -XX:InitiatingHeapOccupancyPercent=15 -XX:G1MixedGCLiveThresholdPercent=90 -XX:G1RSetUpdatingPauseTimePercent=5 -XX:SurvivorRatio=32 -XX:+PerfDisableSharedMem -XX:MaxTenuringThreshold=1 -Daikars.new.flags=true -Dusing.aikars.flags=https://mcflags.emc.gs -Djava.net.preferIPv4Stack=true -XX:NativeMemoryTracking=summary paper.jar --nogui

Host: AArch64, 4 cores, 23G, Ubuntu 20.04.6 LTS
Time: Sun Jul 23 07:26:29 2023 UTC elapsed time: 23.546695 seconds (0d 0h 0m 23s)

---------------  T H R E A D  ---------------

Current thread (0x0000ffff207811d0):  JavaThread "C2 CompilerThread1" daemon [_thread_in_native, id=674405, stack(0x0000ffff601bf000,0x0000ffff603bd000) (2040K)]


Current CompileTask:
C2:  23546 11581 %     4       net.minecraft.world.level.levelgen.DensityFunctions$p::a @ 15 (40 bytes)

Stack: [0x0000ffff601bf000,0x0000ffff603bd000],  sp=0x0000ffff603b7fc0,  free space=2019k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xd5664c]  SuperWord::vector_opd(Node_List*, int)+0x24  (superword.cpp:2873)
V  [libjvm.so+0xd5f528]  SuperWord::output()+0xd88  (superword.cpp:2652)
V  [libjvm.so+0xa87f70]  PhaseIdealLoop::build_and_optimize()+0xe40  (loopnode.cpp:4656)
V  [libjvm.so+0x5be40c]  Compile::Optimize()+0x994  (loopnode.hpp:1114)
V  [libjvm.so+0x5bf90c]  Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0xbcc  (compile.cpp:850)
V  [libjvm.so+0x4f5a48]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0xe0  (c2compiler.cpp:119)
V  [libjvm.so+0x5c505c]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0x9b4  (compileBroker.cpp:2265)
V  [libjvm.so+0x5c7e0c]  CompileBroker::compiler_thread_loop()+0x57c  (compileBroker.cpp:1944)
V  [libjvm.so+0x829234]  JavaThread::thread_main_inner() [clone .part.0]+0xa4  (javaThread.cpp:720)
V  [libjvm.so+0xda6a60]  Thread::call_run()+0xa8  (thread.cpp:217)
V  [libjvm.so+0xb97384]  thread_native_entry(Thread*)+0xdc  (os_linux.cpp:783)
C  [libpthread.so.0+0x7624]  start_thread+0x184

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000010


I sadly don't have a proper reproducer other than starting up a fresh minecraft server. My previous attempts weren't successful. I'll attach IGV graphs from the method that failed to compile (fillArray.xml).

Let me know if there is more information I can provide.



Comments
A pull request was submitted for review. URL: https://git.openjdk.org/jdk21/pull/168 Date: 2023-08-09 05:28:42 +0000
09-08-2023

Changeset: d3b578f1 Author: Tobias Hartmann <thartmann@openjdk.org> Date: 2023-08-09 05:16:02 +0000 URL: https://git.openjdk.org/jdk/commit/d3b578f1c9d296ce8f99c70069df886e9f2dbef9
09-08-2023

Fix Request for JDK 21 approved.
08-08-2023

Fix Request (JDK 21): This patch fixes a regression in C2's superword analysis in JDK 21 that leads to a VM crash. Since it occurs with Minecraft, it's likely to also affect other Java applications. The fix is low risk because it's a simple bailout from the superword optimization. Tested up to tier6 + stress jobs (still running).
08-08-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/15189 Date: 2023-08-08 10:50:19 +0000
08-08-2023

Workaround: -XX:ConditionalMoveLimit=0
08-08-2023

I can reproduce the issue on AArch64 without any flags. On x64, we bail out in PhaseIdealLoop::conditional_move because the cost computed via Matcher::float_cmove_cost() is always set to ConditionalMoveLimit: https://github.com/openjdk/jdk/blob/055b4b426cbc56d97e82219f3dd3aba1ebf977e4/src/hotspot/cpu/x86/matcher_x86.hpp#L81 But on AArch64, it's set to 0: https://github.com/openjdk/jdk/blob/055b4b426cbc56d97e82219f3dd3aba1ebf977e4/src/hotspot/cpu/aarch64/matcher_aarch64.hpp#L70 We can avoid the bailout on x64 by using -XX:+UseCMoveUnconditionally. I'll send out a fix for review shortly. We need to get this in before RC on Thursday.
08-08-2023

ILW = Crash in superword, observed with Minecraft server, use -XX:-UseSuperWord = HMM = P2
07-08-2023

> The reason it only ran on Aarch64 machines for you is that there the flags "-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally" are set by default. [~epeter] Both flags are off by default on all platforms, right?
07-08-2023

[~hgreule] great work to extract a reproducer, thanks!
07-08-2023

I simplified the reproducer a bit more: ./java -XX:CompileCommand=PrintCompilation,Reproducer2*::* -XX:CompileCommand=CompileOnly,Reproducer2*::fill -XX:-TieredCompilation -Xbatch -XX:+UseVectorCmov -XX:+UseCMoveUnconditionally -XX:+TraceSuperWord -XX:+TraceLoopOpts -XX:+PrintInlining Reproducer2.java
06-08-2023

./java -XX:CompileCommand=PrintCompilation,Reproducer*::* -XX:CompileCommand=CompileOnly,Reproducer*::fill -XX:-TieredCompilation -Xbatch -XX:+UseVectorCmov -XX:+UseCMoveUnconditionally -XX:+TraceSuperWord Reproducer.java It looks like the CMove merges the results of the mapSingle methods. We have two Records that implement the Fill interface. The two implementations are inlined bimorphically with a If/Region. The created diamond is then CMoved. We have a CmpP and Bool outside the loop that determines all CMoveD inside the loop. The CmpP compares the class of the record, and decides which inlined implementation of mapSingle to choose.
04-08-2023

[~hgreule] Thanks very much for the reproducer. I get the same failure on my x64 machine: ./java -XX:CompileCommand=PrintCompilation,*::* -XX:+UseVectorCmov -XX:+UseCMoveUnconditionally Reproducer.java The reason it only ran on Aarch64 machines for you is that there the flags "-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally" are set by default. Yes, I don't need the flag output for -XX:+TraceSuperWord, I can debug it myself. Thanks again for the work it took for the reproducer, that is a big help!
04-08-2023

[~epeter] I assume the -XX:+TraceSuperWord output from the original isn't needed anymore? Please let me know if you need further information.
04-08-2023

Good news! I spent another few hours and actually got a reproducer! I attached the Reproducer.java file. This file reliably causes a JVM crash on the mentioned versions/platforms for me.
04-08-2023

[~hgreule] Could you run this with "-XX:+TraceSuperWord"? The log may get very long, so not sure if that is feasible. In the "fillArray.xml" I found this partial jasm code: 0 fast_aload_0 1 invokeinterface 6 aload_1 7 aload_2 8 invokeinterface 13 iconst_0 14 istore_3 15 iload_3 16 aload_1 17 arraylength 18 if_icmpge 21 aload_1 22 iload_3 23 fast_aload_0 24 aload_1 25 iload_3 26 daload 27 invokeinterface 32 dastore 33 iinc 36 goto 39 return Is there a chance we could extract more details? For example it would be interesting to see what "27 invokeinterface" calls.
03-08-2023

I'm trying to guess a reproducer. So far no result. But I found this instead: JDK-8313720
03-08-2023

I'm not experienced with the replay compilation, so I can't be of any help there. If you want to run a Minecraft server to reproduce the issue, you need to: 1. Create a directory you want to work in. 2. Visit https://www.minecraft.net/en-us/download/server and download the server jar into the directory created in step 1. 3. You need a file "eula.txt" in the same directory, with the content "eula=true" to agree to the eula (https://www.minecraft.net/en-us/eula). 4. Run the server jar with "java -jar server.jar". 5. The server should start now and crash after a while (it might take multiple minutes when using a slowdebug build to get to the point where it crashes) Note 1: If you want to rerun the server, you first have to delete the "world" directory that was created before. Otherwise, the relevant code won't be called during the startup. Note 2: The crash in that case might differ from my crashes to some extent due to obfuscation (dhe$p::a instead of net.minecraft.world.level.levelgen.DensityFunctions$p::a). Note 3: As this seems to be heavily CPU specific, here is the /proc/cpuinfo (omitted the other identical cores) of the OCI instance: processor : 0 BogoMIPS : 50.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x3 CPU part : 0xd0c CPU revision : 1 as well as the Raspberry Pi: processor : 0 BogoMIPS : 108.00 Features : fp asimd evtstrm crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd08 CPU revision : 3 If there is any other way I can help, please let me know.
03-08-2023

I was not able to reproduce this with replay compilation due to other failures during replay compilation. [~hgreule] could you share the steps you tried to reliably reproduce this?
03-08-2023

Officially, the Minecraft server software is only provided obfuscated. If you want to dig through (decompiled, remapped) code, there are several tools, e.g. https://github.com/PaperMC/mache (this seems to work for me), or https://github.com/SpongePowered/VanillaGradle (I didn't get that running). That's all far from perfect, but I still wasn't successful trying to reduce the code to something smaller that can be shared easily.
02-08-2023

Thanks. I would also need the net/minecraft/world/level/levelgen/DensityFunctions class. Is that publicly available somewhere?
02-08-2023

[~hgreule] could you please share the hs_err* and replay_pid* files?
02-08-2023