JDK-8342498 : Add test for Allocation elimination after use as alignment reference by SuperWord
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 11,17,21,22,23,24
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2024-10-17
  • Updated: 2024-11-11
  • Resolved: 2024-11-05
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 24
24 b23Fixed
Related Reports
Relates :  
Description
We should add the TestA4.java to regression testing, and probably backport it at least to JDK11.

It turns out that the bug is already fixed in all old versions with JDK-8328544 and its backport of this snipped:

// We did not find the int_index. Just to be safe, reject this VPointer.
if (!_has_int_index_after_convI2L) {
  return false;
}

But if somebody were to remove it, we would re-reveal the bug with the CastX2P and Allocation removal, described below.

------------------------------------------------------------------ Original Description ------------------------------------------------------------------

During the investigation of JDK-8339349, we found a replay file that has a different failure mode.

unuzip jars.zip -> place in jars directory.

[~thartmann] narrowed it down:
> I narrowed it down. The issue is introduced/triggered by JDK-8308606 in JDK 22 b03 (see hs_err_pid1979173.log) and fixed/hidden by JDK-8310190 in JDK 23 b05.
> Emanuel, please have a look and verify that the fix was indeed introduced by JDK-8308606 and fixed JDK-8310190. If so, we need to re-triage those bugs.

DEBUG:

/oracle-work/jdk-fork1/build/linux-x64-debug/jdk/bin/java -XX:+ReplayCompiles -XX:+ReplayIgnoreInitErrors -XX:ReplayDataFile=replay_pid3400217.log -cp "jars/*:jars/" -XX:+TraceSuperWord -XX:+UseSuperWord -XX:+TraceNewVectors -Xbatch -XX:+UseG1GC -XX:+PrintIdeal

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/oracle-work/jdk-fork1/open/src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:730), pid=980994, tid=981014
#  Error: assert(this_region != nullptr) failed
#
# JRE version: Java(TM) SE Runtime Environment (23.0) (fastdebug build 23-internal-2024-10-14-1158199.emanuel...)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 23-internal-2024-10-14-1158199.emanuel..., mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xc38091]  G1BarrierSetC2::eliminate_gc_barrier(PhaseMacroExpand*, Node*) const+0x411
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /oracle-work/JDK-8339349/core.980994)
#
# An error report file with more information is saved as:
# /oracle-work/JDK-8339349/hs_err_pid980994.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp

Current CompileTask:
C2:56926  186    b  4       org.apache.coyote.http11.Http11OutputBuffer::write (93 bytes)

Stack: [0x00007facbd1bc000,0x00007facbd2bd000],  sp=0x00007facbd2b7fa0,  free space=1007k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xc38091]  G1BarrierSetC2::eliminate_gc_barrier(PhaseMacroExpand*, Node*) const+0x411  (g1BarrierSetC2.cpp:730)
V  [libjvm.so+0x12d3064]  PhaseMacroExpand::process_users_of_allocation(CallNode*)+0x694  (macro.cpp:159)
V  [libjvm.so+0x12decae]  PhaseMacroExpand::eliminate_allocate_node(AllocateNode*)+0x35e  (macro.cpp:1100)
V  [libjvm.so+0x12df2b2]  PhaseMacroExpand::eliminate_macro_nodes()+0x3b2  (macro.cpp:2386)
V  [libjvm.so+0x12df569]  PhaseMacroExpand::expand_macro_nodes()+0x19  (macro.cpp:2434)
V  [libjvm.so+0x9ece06]  Compile::Optimize()+0xef6  (compile.cpp:2446)
V  [libjvm.so+0x9f04cb]  Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1beb  (compile.cpp:857)
V  [libjvm.so+0x83dc17]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1e7  (c2compiler.cpp:134)
V  [libjvm.so+0x9fba8c]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0x92c  (compileBroker.cpp:2299)
V  [libjvm.so+0x9fc718]  CompileBroker::compiler_thread_loop()+0x468  (compileBroker.cpp:1958)
V  [libjvm.so+0xeb99ac]  JavaThread::thread_main_inner()+0xcc  (javaThread.cpp:721)
V  [libjvm.so+0x179e956]  Thread::call_run()+0xb6  (thread.cpp:220)
V  [libjvm.so+0x14a92a7]  thread_native_entry(Thread*)+0x127  (os_linux.cpp:789)



PRODUCT:

/oracle-work/jdk-fork1/build/linux-x64/jdk/bin/java -XX:+UnlockDiagnosticVMOptions -XX:+ReplayCompiles -XX:+ReplayIgnoreInitErrors -XX:ReplayDataFile=replay_pid3400217.log -cp "jars/*:jars/" -XX:+UseSuperWord -Xbatch -XX:+UseG1GC

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fc3455e00fa, pid=892245, tid=892653
#
# JRE version: Java(TM) SE Runtime Environment (23.0) (build 23-internal-2024-10-14-1157432.xyz...)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (23-internal-2024-10-14-1157432.xyz..., mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x7910fa]  G1BarrierSetC2::eliminate_gc_barrier(PhaseMacroExpand*, Node*) const+0x22a
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /oracle-work/JDK-8339349/core.892245)
#
# An error report file with more information is saved as:
# /oracle-work/JDK-8339349/hs_err_pid892245.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp

Current CompileTask:
C2:42341  186    b  4       org.apache.coyote.http11.Http11OutputBuffer::write (93 bytes)

Stack: [0x00007fc30efaf000,0x00007fc30f0b0000],  sp=0x00007fc30f0ab760,  free space=1009k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x7910fa]  G1BarrierSetC2::eliminate_gc_barrier(PhaseMacroExpand*, Node*) const+0x22a  (node.hpp:406)
V  [libjvm.so+0xbd119f]  PhaseMacroExpand::process_users_of_allocation(CallNode*)+0x6bf  (macro.cpp:159)
V  [libjvm.so+0xbd6d0e]  PhaseMacroExpand::eliminate_allocate_node(AllocateNode*)+0x1ee  (macro.cpp:1100)
V  [libjvm.so+0xbd6e92]  PhaseMacroExpand::eliminate_macro_nodes()+0x122  (macro.cpp:2386)
V  [libjvm.so+0xbd6f39]  PhaseMacroExpand::expand_macro_nodes()+0x19  (macro.cpp:2434)
V  [libjvm.so+0x641bee]  Compile::Optimize()+0x89e  (compile.cpp:2446)
V  [libjvm.so+0x6432ad]  Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0xedd  (compile.cpp:857)
V  [libjvm.so+0x56b091]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1f1  (c2compiler.cpp:134)
V  [libjvm.so+0x648c71]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0xae1  (compileBroker.cpp:2299)
V  [libjvm.so+0x64bd58]  CompileBroker::compiler_thread_loop()+0x498  (compileBroker.cpp:1958)
V  [libjvm.so+0x909d38]  JavaThread::thread_main_inner() [clone .part.0]+0xb8  (javaThread.cpp:721)
V  [libjvm.so+0xebcf7f]  Thread::call_run()+0x9f  (thread.cpp:220)
V  [libjvm.so+0xce0485]  thread_native_entry(Thread*)+0xd5  (os_linux.cpp:789)
Comments
Changeset: f62fc484 Branch: master Author: Emanuel Peter <epeter@openjdk.org> Date: 2024-11-05 11:47:42 +0000 URL: https://git.openjdk.org/jdk/commit/f62fc4844125cc20a91dc2be39ba05a2d3aca8cf
05-11-2024

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/21875 Date: 2024-11-04 13:13:58 +0000
04-11-2024

JDK11 would be equally affected without the "ust to be safe" patch. With JDK8 I did not have luck with that trick... maybe the Allocation just does not get eliminated. But it seems we do indeed vectorize and add the CastP2X for the field-store. I don't want to investigate more for now.
04-11-2024

For JDK17, we have the same as for JDK21: If I return "true" instead, then we crash. I slightly adjusted the test again. /oracle-work/fork-cpu-jdk17/build/linux-x64-debug/jdk/bin/java --add-opens java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/jdk.internal.misc=ALL-UNNAMED -XX:+TraceSuperWord -XX:+UseSuperWord -XX:+TraceNewVectors -Xbatch -XX:+UseG1GC -XX:+PrintEliminateAllocations -XX:-PrintIdeal -XX:+PrintEscapeAnalysis -Xbatch -XX:CompileCommand=compileonly,TestA4::test -XX:CompileCommand=printcompilation,TestA4::test -XX:+TraceLoopOpts -XX:-SplitIfBlocks -XX:LoopMaxUnroll=8 -XX:DominatorSearchLimit=45 TestA4.java This this test also reproduces with the modified JDK21 and JDK23.
04-11-2024

It is clear that this bug was introduced before JDK-8308606. Though it seems that the patch in JDK-8328544 fixed this issue. And this patch is backported now. I'm just a little worried that because of the "just to be safe" comment, this might be understood as overly safe... and reverted because it might be a performance regression. Such a "regression fix" would then only re-reveal this bug on older JDK, not on the newest JDK because there we would never use a non-vectorized mem_ref for alignment. // We did not find the int_index. Just to be safe, reject this VPointer. if (!_has_int_index_after_convI2L) { return false; }
04-11-2024

I investigated 21.0.6+3 ... and we also get blocked because of this patch, which was backported to JDK21 and lower: // We did not find the int_index. Just to be safe, reject this VPointer. if (!_has_int_index_after_convI2L) { return false; } If I return true instead, then the StoreL VPointer (aka SWPointer) is valid. I had to modify the example a little, but it then triggers: /oracle-work/fork-cpu-jdk21/build/linux-x64-debug/jdk/bin/java --add-opens java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/jdk.internal.misc=ALL-UNNAMED -XX:+TraceSuperWord -XX:+UseSuperWord -XX:+TraceNewVectors -Xbatch -XX:+UseG1GC -XX:+PrintEliminateAllocations -XX:-PrintIdeal -XX:+PrintEscapeAnalysis -Xbatch -XX:CompileCommand=compileonly,TestA3::test -XX:CompileCommand=printcompilation,TestA3::test -XX:+TraceLoopOpts -XX:-SplitIfBlocks -XX:LoopMaxUnroll=8 -XX:DominatorSearchLimit=45 TestA3.java So it would be worth adding this test and backporting it, even if currently the JDK21 does not allow taking field-stores as mem_ref.
04-11-2024

I'm changing this to an RFE. The bug seems fixed in all old versions - sort of "on accident" - by JDK-8328544. But we should add the regression test, and probably backport it too.
04-11-2024

Side note: In the replay file, the Node::dominates check failed because of a dead path somewhere. This should now be fixed with JDK-8333334.
01-11-2024

Now digging a little more into affected versions. I'm taking the newest versions of every JDK that I can get today. ############ 21.0.6+3 I cannot make it to work... would have to research a bit more why though. I suspect that it is also the VPointer::is_safe_to_use_as_simple_form change that was packported. See more info below, for later JDK versions. ############ jdk-22.0.2 -> asserts Not surprising, because as far as I know JDK22 is already not getting updates any more. /oracle-work/jdk-22.0.2/fastdebug/bin/java -XX:+TraceSuperWord -XX:+UseSuperWord -XX:+TraceNewVectors -Xbatch -XX:+UseG1GC -XX:+PrintEliminateAllocations -XX:-PrintIdeal -XX:+PrintEscapeAnalysis -Xbatch -XX:CompileCommand=compileonly,TestA2::test -XX:CompileCommand=printcompilation,TestA2::test -XX:+TraceLoopOpts -XX:-SplitIfBlocks -XX:LoopMaxUnroll=8 -XX:DominatorSearchLimit=45 TestA2.java ############ 23.0.2+3 - no assert / crash This is because we now cannot get a the alignment reference from the B:offset. https://github.com/openjdk/jdk/blob/jdk23/src/hotspot/share/opto/superword.cpp We now only take the alignment references from MemNode packs, see SuperWord::determine_mem_ref_and_aw_for_main_loop_alignment. To make that work, we would have to create a pack for B::offset.... not sure that can be done .. but I shall try later. Well... we can have 2 fields that are adjacent. Then, SuperWord might want to pack them. But it looks like the field accesses are judged to be invalid. More investigation needed. ########## JDK24 Ok, I also get invalid VPointers: VLoopVPointers::print: VPointer[mem: 2221 StoreB, base: 1243, adr: 1243, base[1243] + offset( 16) + invar( 0) + scale( 1) * iv] VPointer[mem: 2215 StoreB, base: 1243, adr: 1243, base[1243] + offset( 17) + invar( 0) + scale( 1) * iv] VPointer[mem: 2197 StoreB, base: 1243, adr: 1243, base[1243] + offset( 18) + invar( 0) + scale( 1) * iv] VPointer[mem: 2196 StoreB, base: 1243, adr: 1243, base[1243] + offset( 19) + invar( 0) + scale( 1) * iv] VPointer[mem: 2088 StoreB, base: 1243, adr: 1243, base[1243] + offset( 20) + invar( 0) + scale( 1) * iv] VPointer[mem: 2082 StoreB, base: 1243, adr: 1243, base[1243] + offset( 21) + invar( 0) + scale( 1) * iv] VPointer[mem: 1343 StoreL, invalid] VPointer[mem: 1345 StoreL, invalid] VPointer[mem: 1347 StoreL, invalid] VPointer[mem: 1353 StoreL, invalid] VPointer[mem: 1355 StoreL, invalid] VPointer[mem: 1357 StoreL, invalid] VPointer[mem: 1351 StoreL, invalid] VPointer[mem: 1349 StoreL, invalid] VPointer[mem: 1361 StoreL, invalid] VPointer[mem: 1970 StoreB, base: 1243, adr: 1243, base[1243] + offset( 22) + invar( 0) + scale( 1) * iv] VPointer[mem: 1341 StoreB, base: 1243, adr: 1243, base[1243] + offset( 23) + invar( 0) + scale( 1) * iv] VPointer[mem: 1359 StoreL, invalid] Why does that happen? Ah, due to a recent change, in VPointer::is_safe_to_use_as_simple_form: // We did not find the int_index. Just to be safe, reject this VPointer. if (!_has_int_index_after_convI2L) { return false; } What would happen if we removed that (possibly unnecessary) restriction, i.e. if we return true instead? ... well we cannot really get the field stores to be found adjacent because they are put in different slices. I tried to split the field store with Unsafe... but then it tags the Allocate as NSR. So that seems safe...? Still, this makes me a little nervous, that also JDK24 could be somehow affected.. just in a less obvious way.
31-10-2024

Wow, I was able to find a Testa2.java, which requires no VM changes. Just some odd VM flags. Maybe those could also be removed with more code changes... but not sure that is worth the effort. // Summary: // - Some B allocations are detected as NoEscape, but cannot be removed because of a field load. // - The field loads cannot be LoadNode::split_through_phi because DominatorSearchLimit is too low // for the dominates query to look through some IfNode / IfProj path. // - We go into loop-opts. // - In theory, the Stores of B::offset would be moved out of the loop. But we disable // PhaseIdealLoop::try_move_store_after_loop by setting -XX:-SplitIfBlocks. // - The field loads are folded away because of some MaxUnroll trick, where the val constant folds to 1. // - SuperWord eventually kicks in, and vectorizes the array stores. // - Since some vectorization has happened, SuperWord wants to align the main loop with a memory reference // in the loop. The code here is not very smart, and just picks the memory reference that occurs the // most often. But the B::offset stores occur more often than the array stores, and so we align to // one of the B::offset stores. This inserts a CastP2X under the CheckCastPP of the B allocation. // - Once loop opts is over, we eventually go into macro expansion. // - During macro expansion, we now discover that the Allocations were marked NoEscape, and that by now // there are no field loads any more: yay, we can remove the allocation! // - ... except that there is the CastP2X from SuperWord alignment ... // - The Allocation removal code wants to pattern match the CastP2X as part of a GC barrier, but then // the pattern does not conform to the expecatation - it is after all from SuperWord. This leads to // an assert, and SIGSEGV in product, at least with G1GC. JDK: jdk-23+4 With all my debug flags: /oracle-work/jdk-fork1/build/linux-x64-debug/jdk/bin/java -XX:+TraceSuperWord -XX:+UseSuperWord -XX:+TraceNewVectors -Xbatch -XX:+UseG1GC -XX:+PrintEliminateAllocations -XX:-PrintIdeal -XX:+PrintEscapeAnalysis -Xbatch -XX:CompileCommand=compileonly,TestA2::test -XX:CompileCommand=printcompilation,TestA2::test -XX:+TraceLoopOpts -XX:-SplitIfBlocks -XX:LoopMaxUnroll=8 -XX:DominatorSearchLimit=45 TestA2.java Or just with these flags: /oracle-work/jdk-fork1/build/linux-x64-debug/jdk/bin/java -Xbatch -XX:-SplitIfBlocks -XX:LoopMaxUnroll=8 -XX:DominatorSearchLimit=45 TestA2.java # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/oracle-work/jdk-fork1/open/src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:730), pid=1299433, tid=1299447 # Error: assert(this_region != nullptr) failed # # JRE version: Java(TM) SE Runtime Environment (23.0) (fastdebug build 23-internal-2024-10-14-1158199.emanuel...) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 23-internal-2024-10-14-1158199.emanuel..., mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0xc38091] G1BarrierSetC2::eliminate_gc_barrier(PhaseMacroExpand*, Node*) const+0x411 # # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /oracle-work/JDK-8339349/core.1299433) # # An error report file with more information is saved as: # /oracle-work/JDK-8339349/hs_err_pid1299433.log # # Compiler replay data is saved as: # /oracle-work/JDK-8339349/replay_pid1299433.log # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp
31-10-2024

Progress update: I can of course disable PhaseIdealLoop::try_move_store_after_loop with -XX:-SplitIfBlocks, and so I don't need to make a VM change for that. Nice.
31-10-2024

Current Best Explanation ---------------------------------- public static int test(char[] a) { B b = new B(); for (int i = 0; i < a.length; i++) { a[i] = (char)i; b.offset++; } return b.offset; } The Allocation from "B b = new B();" has these uses: - LoadI + StoreI in loop - LoadI for return And during SuperWord: we align to some memory-reference in the loop. Accidentally we pick the StoreI for b.offset. This means we insert a CastP2X for the base pointer of that reference, which is the CheckCastPP of the Allocation: 46 CheckCastPP === 43 41 [[ 257 89 135 89 ]] #Test$B:NotNull:exact *,iid=29 Oop:Test$B:NotNull:exact *,iid=29 !jvms: Test::test @ bci:0 (line 19) Later (sometime after SuperWord), all the loads of the "b" object disappear (in part because I delayed some of the relevant optimizations). Now, during macro expansion, the Allocation can be removed, because it is scalar. But during the removal, we now find the CastP2X, and assume it belongs to the barrier code. We hit an assert/SIGSEGV because the pattern looks different than expected - after all it is not barrier code but SuperWord alignment code! This is the relevant part of the graph, at the time of the assert: (rr) p find_node(46)->dump_bfs(15,0,"#dMC-") dist dump --------------------------------------------- 0 46 CheckCastPP === 43 41 [[ 89 89 956 ]] #Test$B:NotNull:exact *,iid=29 Oop:Test$B:NotNull:exact *,iid=29 !jvms: Test::test @ bci:0 (line 19) 1 89 AddP === _ 46 46 88 [[ 209 975 310 343 ]] Oop:Test$B:NotNull:exact+12 *,iid=29 !jvms: Test$B::<init> @ bci:6 (line 52) Test::test @ bci:4 (line 19) 1 956 CastP2X === _ 46 [[ 957 ]] 2 209 StoreI === 1124 1132 89 207 [[ 1132 1021 1031 212 ]] @Test$B+12 *, name=offset, idx=6; Memory: @Test$B:NotNull:exact+12 *,iid=29, name=offset, idx=12; !orig=489 !jvms: Test::test @ bci:34 (line 43) 2 975 StoreI === 1013 1021 89 1006 [[ 1021 1031 ]] @Test$B+12 *, name=offset, idx=6; Memory: @Test$B:NotNull:exact+12 *,iid=29, name=offset, idx=12; !orig=209,489 !jvms: Test::test @ bci:34 (line 43) 2 310 StoreI === 320 322 89 311 [[ 322 331 ]] @Test$B+12 *, name=offset, idx=6; Memory: @Test$B:NotNull:exact+12 *,iid=29, name=offset, idx=12; !orig=209,489 !jvms: Test::test @ bci:34 (line 43) 2 343 StoreI === 353 355 89 344 [[ 355 1132 365 ]] @Test$B+12 *, name=offset, idx=6; Memory: @Test$B:NotNull:exact+12 *,iid=29, name=offset, idx=12; !orig=209,489 !jvms: Test::test @ bci:34 (line 43) 2 957 ConvL2I === _ 956 [[ 968 ]] #int 3 968 URShiftI === _ 957 441 [[ 969 ]] 4 969 AndI === _ 968 900 [[ 962 ]] !orig=[960] 5 962 AddI === _ 969 522 [[ 963 ]] 6 963 AndI === _ 962 900 [[ 964 ]] 7 964 AddI === _ 963 129 [[ 965 ]] 8 965 MinI === _ 964 115 [[ 347 ]] !orig=[440] 9 347 CmpI === _ 350 965 [[ 346 ]] !orig=249,[219] !jvms: Test::test @ bci:16 (line 41) 10 346 Bool === _ 347 [[ 356 ]] [lt] !orig=250,[220] !jvms: Test::test @ bci:16 (line 41) 11 356 CountedLoopEnd === 353 346 [[ 357 371 ]] [lt] P=0.500000, C=112637.000000 !orig=251,[221] !jvms: Test::test @ bci:16 (line 41) ------------------------ I made these VM changes: - PhaseIdealLoop::try_move_store_after_loop -> disable moving stores out of the loop until after loop opts. This means we have some of the StoreI from b.offset in the loop still. - LoadNode::split_through_phi -> delay splitting the LoadI until after loop opts. This ensures that the LoadI at the return stays after the loop, and so does not get eliminated during escape analysis -> otherwise the Allocation already gets removed then as it has no LoadI use any more. But we need the LoadI to stay until after SuperWord. - StoreNode::Ideal -> delay folding stores to the same address until after loop opts. This ensures we have lots of StoreI from b.offset left in the loop, and the unrolled copies do not fold away. -> Ah, I just played with it: I can also disable the merging completely and it also still works. So maybe this can be played with a little more...? The consequence is that during SuperWord, we have lots of StoreI from b.offset in the loop. That way, SuperWord picks one of those StoreI as the alignment reference, instead of the array references. But after loopopts, we want the LoadI of the b.offset to disappear, so that the Allocation does not have a LoadI use any more -> and the Allocation can be removed... but of course we still need it for alignment in the loop ... and the GC code finds the CastP2X and mistakes it for barrier code, rather than alignment code -> assert/SIGSEGV.
30-10-2024

I have been playing a little with this. I'm still working on a reproducer that works directly with JDK-23+4. But what I got so far, is a TestA.java with a JVM patch patch-for-TestA.diff. With that, I can reproduce the assert - by disabling / delaying some optimizations. /oracle-work/jdk-fork1/build/linux-x64-debug/jdk/bin/java -XX:+TraceSuperWord -XX:+UseSuperWord -XX:+TraceNewVectors -Xbatch -XX:+UseG1GC -XX:+PrintEliminateAllocations -XX:-PrintIdeal -XX:+PrintEscapeAnalysis -Xbatch -XX:CompileCommand=compileonly,TestA::test -XX:CompileCommand=printcompilation,TestA::test -XX:+TraceLoopOpts TestA.java /oracle-work/jdk-fork1/build/linux-x64-debug/jdk/bin/java -Xbatch -XX:+UseG1GC -Xbatch -XX:CompileCommand=compileonly,TestA::test -XX:CompileCommand=printcompilation,TestA::test TestA.java # Internal Error (/oracle-work/jdk-fork1/open/src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:730), pid=856223, tid=856237 # Error: assert(this_region != nullptr) failed /oracle-work/jdk-fork1/build/linux-x64/jdk/bin/java -Xbatch -XX:+UseG1GC -Xbatch -XX:CompileCommand=compileonly,TestA::test -XX:CompileCommand=printcompilation,TestA::test TestA.java # SIGSEGV (0xb) at pc=0x00007f59a4d670fa, pid=857685, tid=857699 # # JRE version: Java(TM) SE Runtime Environment (23.0) (build 23-internal-2024-10-14-1157432.emanuel...) # Java VM: Java HotSpot(TM) 64-Bit Server VM (23-internal-2024-10-14-1157432.emanuel..., mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x7910fa] G1BarrierSetC2::eliminate_gc_barrier(PhaseMacroExpand*, Node*) const+0x22a ----------------------------- SerialGC: /oracle-work/jdk-fork1/build/linux-x64-debug/jdk/bin/java -Xbatch -XX:+UseSerialGC -Xbatch -XX:CompileCommand=compileonly,TestA::test -XX:CompileCommand=printcompilation,TestA::test TestA.java # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/oracle-work/jdk-fork1/open/src/hotspot/share/gc/shared/c2/cardTableBarrierSetC2.cpp:174), pid=865963, tid=865972 # assert(mem->is_Store()) failed: store required # # JRE version: Java(TM) SE Runtime Environment (23.0) (fastdebug build 23-internal-2024-10-14-1158199.emanuel...) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 23-internal-2024-10-14-1158199.emanuel..., mixed mode, tiered, compressed oops, compressed class ptrs, serial gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x868688] CardTableBarrierSetC2::eliminate_gc_barrier(PhaseMacroExpand*, Node*) const+0x2e8 # # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /oracle-work/JDK-8339349/core.865963) # # An error report file with more information is saved as: # /oracle-work/JDK-8339349/hs_err_pid865963.log # # Compiler replay data is saved as: # /oracle-work/JDK-8339349/replay_pid865963.log # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp Product seems to run through... but not sure how that is sane! -> we do some replacement of nodes... it is not very meaningful, but probably also not horribly incorrect...?
30-10-2024

ILW = Same as JDK-8339349 = P2
18-10-2024

Some initial anaysis can be found in JDK-8339349. My first goal now is to find a reproducer.
17-10-2024