JDK-8348659 : AArch64: IR rule failure with compiler/loopopts/superword/TestSplitPacks.java
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 24,25
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • CPU: aarch64
  • Submitted: 2025-01-27
  • Updated: 2025-02-10
  • Resolved: 2025-02-05
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 25
25 b09Fixed
Related Reports
Relates :  
Description
This reliably fails on my Graviton 3 instance. I believe only test5a subset fails. I have not investigated this any deeply.

$ CONF=linux-aarch64-server-fastdebug make test TEST=compiler/loopopts/superword/TestSplitPacks.java 

Failed IR Rules (1) of Methods (1)
----------------------------------
1) Method "static java.lang.Object[] compiler.loopopts.superword.TestSplitPacks.test5a(short[],short[],short)" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sse4.1", "true", "asimd", "true"}, counts={"_#V#LOAD_VECTOR_S#_", "_@2", "> 0", "_#V#LOAD_VECTOR_S#_", "_@4", "> 0", "_#V#LOAD_VECTOR_S#_", "_@8", "> 0", "_#V#ADD_VS#_", "_@2", "> 0", "_#V#ADD_VS#_", "_@8", "> 0", "_#V#ADD_VS#_", "_@4", "> 0", "_#STORE_VECTOR#_", "> 0"}, failOn={}, applyIfPlatformOr={}, applyIfPlatform={"64-bit", "true"}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={"MaxVectorSize", ">=32", "AlignVector", "false"}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\d+(\s){2}(LoadVector.*)+(\s){2}===.*vector[A-Za-z]<S,2>)"
           - Failed comparison: [found] 0 > 0 [given]
           - No nodes matched!
         * Constraint 4: "(\d+(\s){2}(AddVS.*)+(\s){2}===.*vector[A-Za-z]<S,2>)"
           - Failed comparison: [found] 0 > 0 [given]
           - No nodes matched!

Comments
Changeset: 66a38984 Branch: master Author: Bhavana Kilambi <bkilambi@openjdk.org> Committer: Aleksey Shipilev <shade@openjdk.org> Date: 2025-02-05 08:37:21 +0000 URL: https://git.openjdk.org/jdk/commit/66a3898448023f1f22da7d7cbcf4c79a0eb59963
05-02-2025

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/23385 Date: 2025-01-31 11:19:27 +0000
31-01-2025

[~epeter] Yes, I could see that in the Trace output as well. Kudos to you for printing results of SuperWord at every stage :) I have been meaning to split the IR rules into two as well, one for sse4.1 and another for asimd/sve. I'll upload a patch soon. Thanks!
31-01-2025

[~bkilambi] Thanks for the trace! Yes you are right, the trace tells me it is about "implemented" as well: WARNING: Removed pack: not implemented at any smaller size: 0: 6523 LoadS === 6397 6564 6524 [[ 6522 ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=5; #short (does not depend only on test, unknown control) !orig=5649,1127,[118] !jvms: TestSplitPacks::test5a @ bci:17 (line 19) 1: 6520 LoadS === 6373 6521 6526 [[ 6519 ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=5; #short (does not depend only on test, unknown control) !orig=5646,1205,[179] !jvms: TestSplitPacks::test5a @ bci:30 (line 20) 2: 6483 LoadS === 6349 6518 6484 [[ 6482 ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=5; #short (does not depend only on test, unknown control) !orig=5641,1286,[227] !jvms: TestSplitPacks::test5a @ bci:43 (line 21) WARNING: Removed pack: not implemented at any smaller size: 0: 6522 AddI === _ 6523 12 [[ 6521 ]] !orig=5648,119 !jvms: TestSplitPacks::test5a @ bci:19 (line 19) 1: 6519 AddI === _ 6520 12 [[ 6518 ]] !orig=5645,180 !jvms: TestSplitPacks::test5a @ bci:32 (line 20) 2: 6482 AddI === _ 6483 12 [[ 6480 ]] !orig=5640,228 !jvms: TestSplitPacks::test5a @ bci:45 (line 21) WARNING: Removed pack: not implemented at any smaller size: 0: 6521 StoreC === 6558 6564 6544 6522 [[ 6518 6520 ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=5; Memory: @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; !orig=5647,156 !jvms: TestSplitPacks::test5a @ bci:21 (line 19) 1: 6518 StoreC === 6558 6521 6545 6519 [[ 6480 6483 ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=5; Memory: @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; !orig=5644,203 !jvms: TestSplitPacks::test5a @ bci:34 (line 20) 2: 6480 StoreC === 6558 6518 6481 6482 [[ 6477 6479 ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=5; Memory: @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; !orig=5639,251 !jvms: TestSplitPacks::test5a @ bci:47 (line 21) I guess you can just adjust the IR rule. Maybe just create 2 IR rules, one for sse4.1 and one for sve. Because asimd never has MaxVectorSize >= 32 anyway!
31-01-2025

I have attached the output of "TraceAutoVectorization" option as well. Thanks for the suggestion and quick response :) I can fix this upstream if it's ok ..
30-01-2025

[~epeter] Sure I can do that. I have just been doing a bit of gdb and realized (how could I miss this!!) that the min vector size on aarch64 is 8B. So the call to - SuperWord::implemented() for type T_SHORT and vlen = 2 would return false, thereby preventing generation of vectors of short with 2 lanes.
30-01-2025

[~bkilambi] Would you mind attaching the log with TraceAutoVectorization tag ALL for that method, so that I can give an educated guess where things might be going wrong? Right, it looks like we are creating all other vectors, just not the one with size 2. I'd like to find out why :)
30-01-2025

[~epeter] I can reproduce this on a Graviton3 instance. The IR rule is applied on machines with max vector size >= 32B so this may not be reproducible on machines with smaller vector length. Just a quick glance at the generate IR, it seems to be generating AddVS for vector sizes 4 and 8 but not vector size of 2. The IR looks something like this - 6741 LoadVector === 6325 6564 273 |1091 [[ 6742 ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectord<S,4> (does not depend only on test, unknown control) !orig=[1367],[275] !jvms: TestSplitPacks::test5a @ bci:56 (line 22) 6742 AddVS === _ 6741 6737 [[ 6746 ]] #vectord<S,4> !orig=[276] !jvms: TestSplitPacks::test5a @ bci:58 (line 22) 6743 LoadVector === 6229 6564 465 |1091 [[ 6744 ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<S,8> (does not depend only on test, unknown control) !orig=[1691],[467] !jvms: TestSplitPacks::test5a @ bci:112 (line 26) 6744 AddVS === _ 6743 6733 [[ 6745 ]] #vectorx<S,8> !orig=[468] !jvms: TestSplitPacks::test5a @ bci:114 (line 26) 6745 StoreVector === 6558 6564 489 6744 |1129 [[ 251 ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched Memory: @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=5; !orig=[491] !jvms: TestSplitPacks::test5a @ bci:116 (line 26) 6746 StoreVector === 6558 203 297 6742 |1129 [[ 6443 6441 ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched Memory: @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=5; !orig=[299] !jvms: TestSplitPacks::test5a @ bci:60 (line 22) 6747 LoadVector === 6325 6441 6491 |1091 [[ 6748 ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectord<S,4> (does not depend only on test, unknown control) !orig=[6434],[1367],[275] !jvms: TestSplitPacks::test5a @ bci:56 (line 22) 6748 AddVS === _ 6747 6737 [[ 6749 ]] #vectord<S,4> !orig=[6433],[276] !jvms: TestSplitPacks::test5a @ bci:58 (line 22) 6749 StoreVector === 6558 6441 6510 6748 |1129 [[ 6437 6435 ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched Memory: @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=5; !orig=[6432],[299] !jvms: TestSplitPacks::test5a @ bci:60 (line 22) 6750 LoadVector === 6229 6435 6497 |1091 [[ 6751 ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<S,8> (does not depend only on test, unknown control) !orig=[6422],[1691],[467] !jvms: TestSplitPacks::test5a @ bci:112 (line 26) 6751 AddVS === _ 6750 6733 [[ 6752 ]] #vectorx<S,8> !orig=[6421],[468] !jvms: TestSplitPacks::test5a @ bci:114 (line 26) 6752 StoreVector === 6558 6435 6502 6751 |1129 [[ 6440 6438 ]] @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched Memory: @short[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=5; !orig=[6420],[491] !jvms: TestSplitPacks::test5a @ bci:116 (line 26) Looks like the 15 pack is being split correctly into 8 and 7 and the pack with 7 nodes is also being split into 4 and 3 but the pack with size 3 is not being split into packs 2 and 1.
30-01-2025

This is a pretty standard m7g.16xlarge from EC2, I think any *7g would reproduce the same.
28-01-2025

Yes, looks like -XX:UseSVE=0 makes the test pass.
28-01-2025

[~shade] Thanks for the report. I'd love to investigate, but I have no Graviton 3 machine, and in our internal testing this does not seem to reproduce. Can you check if this is an SVE only issue, by disabling SVE?
28-01-2025

I'm marking / triaging this as test bug for now, please adjust if it turns out to be a product issue. ILW = Test fails because vectorization fails, single (sub-)test on Graviton 3, no workaround = MLH = P4
28-01-2025