JDK-8310190 : C2 SuperWord: AlignVector is broken, generates misaligned packs
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 11,17,19,21,22,23
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2023-06-16
  • Updated: 2024-11-18
  • Resolved: 2024-01-08
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 23
23 b05Fixed
Related Reports
Blocks :  
Blocks :  
Duplicate :  
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
I found some examples that indicate to me that AlignVector is fundamentally broken. I have not been able to verify this on a machine that actually would SIGBUS, but I definately see that non-aligned vector loads/stores are generated on a x64 machine.

For example:
vmovd  %xmm9,0x13(%rdx,%r13,1)

I have 5 examples.

I would be interested to have confirmation from a machine that actually requires AlignVector, if this ever leads to true failures.

-------------------------- Test::test0 --------------------------

./java -Xcomp -XX:-TieredCompilation -XX:CompileCommand=compileonly,Test::test0 -XX:+TraceNewVectors -XX:+TraceSuperWord -XX:+TraceLoopOpts -XX:MaxVectorSize=16 -XX:+AlignVector -XX:+Verbose -XX:LoopUnrollLimit=10000 Test.java

This and test1 are some control tests, just to see that we get vectorization in the safe cases.

Here, I get packs that are 0-aligned or 8-aligned, with length of 4 bytes per vector.

-------------------------- Test::test1 --------------------------

./java -Xcomp -XX:-TieredCompilation -XX:CompileCommand=compileonly,Test::test1 -XX:+TraceNewVectors -XX:+TraceSuperWord -XX:+TraceLoopOpts -XX:MaxVectorSize=16 -XX:+AlignVector -XX:+Verbose -XX:LoopUnrollLimit=10000 Test.java

This is also a control test for vectorization in a safe case.
I get 16-packs, all 0-aligned. Good.

-------------------------- Test::test2 --------------------------

./java -Xcomp -XX:-TieredCompilation -XX:CompileCommand=compileonly,Test::test2 -XX:+TraceNewVectors -XX:+TraceSuperWord -XX:+TraceLoopOpts -XX:MaxVectorSize=16 -XX:+AlignVector -XX:+Verbose -XX:LoopUnrollLimit=10000 Test.java

This case is somewhat surprising, we actually do not vectorize even though we technically would be allowed to.
The issue is that find_adjacent_refs seems to find no memref to align to, I think the issue is that we have no memref that is 0-aligned with the vector-width (no "i+0" case).

-------------------------- Test::test3 --------------------------

./java -Xcomp -XX:-TieredCompilation -XX:CompileCommand=compileonly,Test::test3 -XX:+TraceNewVectors -XX:+TraceSuperWord -XX:+TraceLoopOpts -XX:MaxVectorSize=16 -XX:+AlignVector -XX:+Verbose -XX:LoopUnrollLimit=10000 Test.java

This case is the same as test2, but we have an additional access at "i+0".
Now find_adjacent_refs finds a memref to align to. But it turns out later that it actually is not part of a vector!

But we do create some 4-packs, they are 3 or 11 aligned, however!
And these are some assembly instructions I can find with -XX:CompileCommand=print,Test::test3:

vmovd  %xmm9,0x13(%rdx,%r13,1)

It is possible that this still gets aligned, as everything is at a "3-offset", but given that we align to the best-memref found in find_adjacent_refs, this is implausible: that one has a 0-alignment, while the vectors have a 3/11 alignment.

Pack: 0
 align: 3 	 2063  StoreB  === 2274 2066 2087 2064  [[ 2060 2062 ]]  @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6;  Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=1634,1362,1153,215 !jvms: Test::test3 @ bci:34 (line 56)
 align: 4 	 2060  StoreB  === 2274 2063 2089 2061  [[ 2055 2059 ]]  @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6;  Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=1631,1349,1150,263 !jvms: Test::test3 @ bci:47 (line 57)
 align: 5 	 2055  StoreB  === 2274 2060 2056 2058  [[ 2052 2054 ]]  @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6;  Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=1626,1346,1147,311 !jvms: Test::test3 @ bci:60 (line 58)
 align: 6 	 2052  StoreB  === 2274 2055 2098 2053  [[ 2049 2051 ]]  @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6;  Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=1623,1343,1143,359,1207 !jvms: Test::test3 @ bci:75 (line 59)

Pack: 24
 align: 11 	 2046  StoreB  === 2274 2049 2095 2047  [[ 2043 2045 ]]  @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6;  Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=1617,1337,215 !jvms: Test::test3 @ bci:34 (line 56)
 align: 12 	 2043  StoreB  === 2274 2046 2096 2044  [[ 2040 2042 ]]  @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6;  Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=1614,1334,263 !jvms: Test::test3 @ bci:47 (line 57)
 align: 13 	 2040  StoreB  === 2274 2043 2092 2041  [[ 2037 2039 ]]  @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6;  Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=1611,1331,311 !jvms: Test::test3 @ bci:60 (line 58)
 align: 14 	 2037  StoreB  === 2274 2040 2088 2038  [[ 2002 2004 ]]  @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6;  Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=1608,1327,359,1207 !jvms: Test::test3 @ bci:75 (line 59)

-------------------------- Test::test4 --------------------------

./java -Xcomp -XX:-TieredCompilation -XX:CompileCommand=compileonly,Test::test4 -XX:+TraceNewVectors -XX:+TraceSuperWord -XX:+TraceLoopOpts -XX:MaxVectorSize=16 -XX:+AlignVector -XX:+Verbose -XX:LoopUnrollLimit=10000 Test.java

Run with - to see the assembly generated, for example I see:

vmovd  0x20(%rsi,%r11,1),%xmm14
vmovq  0x25(%rsi,%r11,1),%xmm26

These cannot possibly be aligned, their offset is odd!

This is from -XX:+TraceSuperWord, we see a 4-pack (0 aligned) and an 8-pack (5-aligned), this corresponds to the two assembly instructions above:

Pack: 0
 align: 0 	 3139  StoreB  === 3358 3362 3155 3140  [[ 3136 3138 ]]  @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6;  Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2517,2124,1801,187 !jvms: Test::test4 @ bci:30 (line 53)
 align: 1 	 3136  StoreB  === 3358 3139 3166 3137  [[ 3133 3135 ]]  @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6;  Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2514,2121,1798,238 !jvms: Test::test4 @ bci:49 (line 54)
 align: 2 	 3133  StoreB  === 3358 3136 3163 3134  [[ 3130 3132 ]]  @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6;  Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2511,2118,1795,290 !jvms: Test::test4 @ bci:68 (line 55)
 align: 3 	 3130  StoreB  === 3358 3133 3165 3131  [[ 3127 3129 ]]  @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6;  Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2508,2115,1792,342 !jvms: Test::test4 @ bci:87 (line 56)

Pack: 48
 align: 5 	 3127  StoreB  === 3358 3130 3160 3128  [[ 3124 3126 ]]  @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6;  Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2505,2112,1789,394 !jvms: Test::test4 @ bci:106 (line 58)
 align: 6 	 3124  StoreB  === 3358 3127 3158 3125  [[ 3121 3123 ]]  @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6;  Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2502,2109,1786,446 !jvms: Test::test4 @ bci:127 (line 59)
 align: 7 	 3121  StoreB  === 3358 3124 3156 3122  [[ 3118 3120 ]]  @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6;  Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2499,2106,1783,498 !jvms: Test::test4 @ bci:148 (line 60)
 align: 8 	 3118  StoreB  === 3358 3121 3164 3119  [[ 3115 3117 ]]  @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6;  Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2496,2103,1780,550 !jvms: Test::test4 @ bci:169 (line 61)
 align: 9 	 3115  StoreB  === 3358 3118 3159 3116  [[ 3112 3114 ]]  @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6;  Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2493,2100,1777,602 !jvms: Test::test4 @ bci:190 (line 62)
 align: 10 	 3112  StoreB  === 3358 3115 3162 3113  [[ 3109 3111 ]]  @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6;  Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2490,2097,1774,654 !jvms: Test::test4 @ bci:211 (line 63)
 align: 11 	 3109  StoreB  === 3358 3112 3157 3110  [[ 3106 3108 ]]  @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6;  Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2487,2094,1771,706 !jvms: Test::test4 @ bci:232 (line 64)
 align: 12 	 3106  StoreB  === 3358 3109 3161 3107  [[ 3103 3105 ]]  @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6;  Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=2484,2091,1768,758,1841 !jvms: Test::test4 @ bci:253 (line 65)
Comments
Changeset: 827c71da Author: Emanuel Peter <epeter@openjdk.org> Date: 2024-01-08 16:10:21 +0000 URL: https://git.openjdk.org/jdk/commit/827c71dac9a5732f70bc7341743bce314cad302f
08-01-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/14785 Date: 2023-07-06 14:13:01 +0000
06-10-2023

I found an additional example: ./java -XX:CompileCommand=compileonly,Test21::test21 -XX:CompileCommand=printcompilation,Test21::test* -XX:LoopUnrollLimit=250 --add-modules java.base --add-exports java.base/jdk.internal.misc=ALL-UNNAMED -XX:UseAVX=0 -XX:UseSSE=3 -Xbatch -XX:+TraceSuperWord Test21.java It is already included in JDK-8316594, but guarded with avx2, so that it does not trigger on other platforms. This is the assert: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/oracle-work/jdk-fork3/open/src/hotspot/share/opto/superword.cpp:1031), pid=917810, tid=917824 # assert((ABS(iv_adjustment_in_bytes) % elt_size) == 0 || !vectors_should_be_aligned()) failed: (6) should be divisible by (4) # # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-04-1211462.emanuel...) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-04-1211462.emanuel..., mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x1707abb] SuperWord::get_iv_adjustment(MemNode*)+0x1ab I have already removed this in the current PR, so this will be fixed.
04-10-2023

I found an issue with unaligned Unsafe load/store (we assume base address is aligned, which is wrong -> unaligned load/store): ./java -XX:CompileCommand=compileonly,Test24::test24 -XX:CompileCommand=printcompilation,Test24::test* -XX:LoopUnrollLimit=250 --add-modules java.base --add-exports java.base/jdk.internal.misc=ALL-UNNAMED -Xbatch -XX:+TraceSuperWord -XX:+AlignVector Test24.java I will not address this here, but probably in a future fix.
21-09-2023

Thanks [~fgao] for confirming this!
05-07-2023

Hi [~epeter], I tried the following testcase and it did crash on linux-arm: static void test3(short[] a, short[] b, short mask) { for (int i = 2; i < RANGE-6; i+=8) { // Problematic for AlignVector b[i+0] = (short)(a[i+0] + mask); // best_memref, align 0 b[i+3] = (short)(a[i+3] + mask); // pack at offset 3 shorts b[i+4] = (short)(a[i+4] + mask); b[i+5] = (short)(a[i+5] + mask); b[i+6] = (short)(a[i+6] + mask); } } Here is part of log: After filter_packs packset Pack: 0 align: 6 1796 StoreC === 1884 1799 1844 1797 [[ 1793 1795 ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=1493,1278,1113,219 !jvms: Test::test3 @ bci:37 (line 56) align: 8 1793 StoreC === 1884 1796 1846 1794 [[ 1790 1792 ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=1490,1275,1110,270 !jvms: Test::test3 @ bci:50 (line 57) align: 10 1790 StoreC === 1884 1793 1845 1791 [[ 1787 1789 ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=1487,1272,1107,322 !jvms: Test::test3 @ bci:63 (line 58) align: 12 1787 StoreC === 1884 1790 1843 1788 [[ 1784 1786 ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=1484,1269,1104,372,1140 !jvms: Test::test3 @ bci:78 (line 59) Pack: 16 align: 6 1798 LoadS === 753 1799 1804 [[ 1797 ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Type:short !orig=1495,1280,1115,194 !jvms: Test::test3 @ bci:33 (line 56) align: 8 1795 LoadS === 753 1796 1806 [[ 1794 ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Type:short !orig=1492,1277,1112,245 !jvms: Test::test3 @ bci:46 (line 57) align: 10 1792 LoadS === 753 1793 1805 [[ 1791 ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Type:short !orig=1489,1274,1109,297 !jvms: Test::test3 @ bci:59 (line 58) align: 12 1789 LoadS === 753 1790 1803 [[ 1788 ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Type:short !orig=1486,1271,1106,347 !jvms: Test::test3 @ bci:74 (line 59) # # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0x7) at pc=0xf45bef28, pid=94256, tid=94257 # # JRE version: OpenJDK Runtime Environment (22.0) (slowdebug build 22-internal-git-7b3c2dc5f) # Java VM: OpenJDK Server VM (slowdebug 22-internal-git-7b3c2dc5f, compiled mode, sharing, g1 gc, linux-arm) # Problematic frame: # J 1 c2 Test.test3([S[SS)V (86 bytes) @ 0xf45bef28 [0xf45beb90+0x00000398] # # Core dump will be written. Default location: /tmp/core.94256 # # An error report file with more information is saved as:
05-07-2023

A few more comments: I think that AlignVector is broken. But I also think that the current implementation in SuperWord::find_adjacent_refs is quite convoluted. I would like to remove it from there, and re-implement it as a "filter" pass. The idea is that we can first generate all the packs, split them etc, and only at the end check if all of them are actually alignable. If not, we filter out those that are not alignable. We should also have some debug runtime checks if all packs are actually adequately aligned. I imagine adding a verification node between the pointer/address and the VectorLoad/VectorStore. In the backend, we take the address modulo the alignment-size (size of vector, or AlignVectorSize if we also implement JDK-8303827). If the modulo is not zero, we SIGSEGV or Halt. Anyway: getting AlignVector fully outside SuperWord::find_adjacent_refs would be a big win, it would make the logic there so much simpler. More maintainable and extensible.
03-07-2023

The priority on this bug is not extremely high, it is probably quite unlikely to trigger in the wild. So enjoy the Festival! But we should eventually fix this. It would be nice to untangle the code in find_adjacent_refs a bit. It is quite messy and resistant to extensions currently. The modulo calculus for alignment for example would make it very hard to make extensions for strided memory access or gather/scatter.
20-06-2023

[~fgao] Yes, that is strange. For test3 it does generate packs, but not for test4. I guess the reason is maybe in find_align_to_ref, where we find the first memref to align to? If we pick a different memref, then the alignment shifts around. When I look which one it picks for test3 on my avx512 machine: Vector align to node: 2115 LoadB === 791 2292 2116 [[ 2114 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Type:byte !orig=1662,1367,1158,129 !jvms: Test::test3 @ bci:17 (line 54) SuperWord::get_iv_adjustment: n = 2115, noffset = 16 iv_adjust = 16 elt_size = 1 scale = 1 iv_stride = 128 vect_size 16: 2115 LoadB === 791 2292 2116 [[ 2114 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Type:byte !orig=1662,1367,1158,129 !jvms: Test::test3 @ bci:17 (line 54) Versus your arm32 machine: Vector align to node: 1854 StoreB === 1940 1857 1873 1855 [[ 1851 1853 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=1511,1246,1086,257 !jvms: Test::test3 @ bci:47 (line 57) SuperWord::get_iv_adjustment: n = 1854, noffset = 16 iv_adjust = 16 elt_size = 1 scale = 1 iv_stride = 128 vect_size 16: 1854 StoreB === 1940 1857 1873 1855 [[ 1851 1853 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact+any *, idx=6; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=1511,1246,1086,257 !jvms: Test::test3 @ bci:47 (line 57) We see that a different memref is picked. arm512 gets the "i+0", arm32 "i+4". That of course rotates all the alignments around. Where I get pairs: (3,4), (4,5), (5,6) and (11,12), (12,13), (13,14) Your arm32 gets: (0,1), (1,2) and (7,8), (8,9), (9,10) There would also be a (-1,0) or (15,0) pair, but that one wraps around the 16 byte alignment boundary, so it is not created.
20-06-2023

Hi [~epeter], you may notice that the alignment of the test3 is [7, 8, 9, 10] on arm32 rather than [3, 4, 5, 6], and I'm quite confused about this point. I changed the testcase to see if it will generate unaligned packs for arm32 but didn't succeed. So, I'm thinking about if there is any implicit pattern that superword would follow to pick nodes and generate packs, something related to vector width, for the special loop? Or I can look into it next week, after finishing Chinese Dragon Boat Festival.
20-06-2023

[~fgao] Thanks for running the experiments. Ok, these examples do not vectorize on arm32. I'm not sure why, as I only have your logs. It seems that no pairs are actually generated in find_adjacent_refs. But are you also concerned, or do you think there is no bug here that could actually manifest in a wrong result / SIGBUS etc?
19-06-2023

ILW = Misaligned packs with AlignVector possibly leading to crashes, rare, use -XX:-AlignVector = MLM = P4
19-06-2023

Hi [~epeter], the attached java case passed on linux-arm machine since arm32 only supports at least 8-byte vector operation. See https://github.com/openjdk/jdk/blob/492d25c8df0f818d6f6e3a18a82bfad8fa95c282/src/hotspot/cpu/arm/arm.ad#L1078.
19-06-2023

[~fgao] Do you still have access to a linux-arm machine where unaligned vector ops would lead to a SIGBUS? I remembered that you could test that before: https://github.com/openjdk/jdk/pull/12350#issuecomment-1437978867
16-06-2023

Results with older JDK: test3: same issue since at least JDK-11. test4: JDK-17 does not vectorize. JDK-19 and onward seem to have same issue as reported. I'd assume that this was wrong since day 1. The issue is that we just assume that checking alignment of a single memref ("best" memref) is sufficient. But of course with partial vectorization it is possible that the "best" memref is not part of some vector, and that that vector has a different alignment than "best". Suggestion: we should check alignment only once we have the final packset. Then we can check if all of the packs are alignable. If not all are alignable, we can either completely refuse to vectorize, or just filter out those that are not alignable.
16-06-2023