JDK-8370673 : C2 SuperWord [x86]: implement long mul reduction
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 26
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • CPU: x86
  • Submitted: 2025-10-27
  • Updated: 2025-10-27
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Relates :  
Description
I noticed that we have the MulV element-wise vector implemented, but not the reduction. I think it should be possible to allow the reduction.

It already works for AVX512dq. But it should also work for AVX2, AVX1, and maybe even SSE4.1

I found this during work on JDK-8340093..
See tests in:
test/hotspot/jtreg/compiler/loopopts/superword/TestReductions.java

I attached a Reduction2.java for demonstration.

One can see that the element-wise MulL is vectorized, but not the reduction. But an add-reduction is vectorized, so all the shuffling should be available. That indicates to me that we should be able to do a MulL reduction.

Investigate if we have the same issue with the Vector API.

[empeter@emanuel bin]$ ./java -Xbatch -XX:CompileCommand=compileonly,Reduction2::test* -XX:CompileCommand=printcompilation,Reduction2::test* -XX:+TraceNewVectors -XX:UseAVX=1 -XX:CompileCommand=TraceAutoVectorization,Reduction2::test*,SW_REJECTIONS Reduction2.java
CompileCommand: compileonly Reduction2.test* bool compileonly = true
CompileCommand: PrintCompilation Reduction2.test* bool PrintCompilation = true
CompileCommand: TraceAutoVectorization Reduction2.test* const char* TraceAutoVectorization = 'SW_REJECTIONS'
4088   98 %  b  3       Reduction2::test1 @ 4 (24 bytes)
4090   99    b  3       Reduction2::test1 (24 bytes)
4091  100 %  b  4       Reduction2::test1 @ 4 (24 bytes)

SuperWord::transform_loop:
    Loop: N551/N162  limit_check counted [int,int),+4 (10243 iters)  main multiversion_fast has_sfpt strip_mined
 551  CountedLoop  === 551 264 162  [[ 546 550 551 260 554 555 465 225 ]] inner stride: 4 main of N551 strip mined multiversion_fast !orig=[462],[265],[234],[212] !jvms: Reduction2::test1 @ bci:13 (line 18)

WARNING: Removed pack: not implemented at any smaller size:
    0:  536  MulL  === _ 554 537  [[ 533 ]]  !orig=452,214,189 !jvms: Reduction2::test1 @ bci:14 (line 18)
    1:  533  MulL  === _ 536 534  [[ 452 ]]  !orig=214,189 !jvms: Reduction2::test1 @ bci:14 (line 18)
    2:  452  MulL  === _ 533 453  [[ 214 ]]  !orig=214,189 !jvms: Reduction2::test1 @ bci:14 (line 18)
    3:  214  MulL  === _ 452 215  [[ 266 554 372 ]]  !orig=189 !jvms: Reduction2::test1 @ bci:14 (line 18)

WARNING: Removed pack: not profitable:
    0:  453  LoadL  === 383 7 454  [[ 452 ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; #long !orig=215,188 !jvms: Reduction2::test1 @ bci:13 (line 18)
    1:  215  LoadL  === 383 7 216  [[ 214 ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; #long !orig=188 !jvms: Reduction2::test1 @ bci:13 (line 18)

WARNING: Removed pack: not profitable:
    0:  537  LoadL  === 383 7 538  [[ 536 ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; #long !orig=453,215,188 !jvms: Reduction2::test1 @ bci:13 (line 18)
    1:  534  LoadL  === 383 7 535  [[ 533 ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; #long !orig=215,188 !jvms: Reduction2::test1 @ bci:13 (line 18)

SuperWord::transform_loop failed: SuperWord::SLP_extract did not vectorize
4102  101    b  4       Reduction2::test1 (24 bytes)
4111  102 %  b  3       Reduction2::test2 @ 2 (25 bytes)
4113  103    b  3       Reduction2::test2 (25 bytes)
4114  104 %  b  4       Reduction2::test2 @ 2 (25 bytes)

SuperWord::transform_loop:
    Loop: N582/N152  limit_check counted [int,int),+4 (10243 iters)  main multiversion_fast has_sfpt strip_mined
 582  CountedLoop  === 582 285 152  [[ 562 566 576 581 582 281 585 586 480 494 245 232 ]] inner stride: 4 main of N582 strip mined multiversion_fast !orig=[491],[286],[253],[229] !jvms: Reduction2::test2 @ bci:12 (line 25)
TraceNewVectors [AutoVectorization]:  630  Replicate  === _ 180  [[ ]]  #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  631  Replicate  === _ 180  [[ ]]  #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  632  LoadVector  === 411 586 569  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  633  MulVL  === _ 632 631  [[ ]]  #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  634  StoreVector  === 582 586 569 633  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched  Memory: @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6;
TraceNewVectors [AutoVectorization]:  635  LoadVector  === 411 586 488  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  636  MulVL  === _ 635 630  [[ ]]  #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  637  StoreVector  === 582 634 488 636  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched  Memory: @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6;

SuperWord::transform_loop: success
4132  105    b  4       Reduction2::test2 (25 bytes)

SuperWord::transform_loop:
    Loop: N495/N178  limit_check counted [int,int),+4 (10243 iters)  main has_sfpt strip_mined
 495  CountedLoop  === 495 203 178  [[ 482 485 495 416 500 501 199 165 ]] inner stride: 4 main of N495 strip mined !orig=[422],[204],[195],[113] !jvms: Reduction2::test2 @ bci:8 (line 25)
TraceNewVectors [AutoVectorization]:  564  Replicate  === _ 143  [[ ]]  #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  565  Replicate  === _ 143  [[ ]]  #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  566  LoadVector  === 373 501 419  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]:  567  MulVL  === _ 566 564  [[ ]]  #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  568  StoreVector  === 495 501 419 567  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched  Memory: @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5;
TraceNewVectors [AutoVectorization]:  569  LoadVector  === 373 501 488  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]:  570  MulVL  === _ 569 565  [[ ]]  #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  571  StoreVector  === 495 568 488 570  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched  Memory: @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5;

SuperWord::transform_loop: success
4242  106 %  b  3       Reduction2::test3 @ 4 (24 bytes)
4243  107    b  3       Reduction2::test3 (24 bytes)
4244  108 %  b  4       Reduction2::test3 @ 4 (24 bytes)

SuperWord::transform_loop:
    Loop: N625/N162  limit_check counted [int,int),+8 (10243 iters)  main multiversion_fast has_sfpt strip_mined
 625  CountedLoop  === 625 264 162  [[ 614 617 620 624 625 626 627 546 550 225 260 465 ]] inner stride: 8 main of N625 strip mined multiversion_fast !orig=[551],[462],[265],[234],[212] !jvms: Reduction2::test3 @ bci:13 (line 32)
TraceNewVectors [AutoVectorization]:  675  LoadVector  === 383 7 606  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  676  LoadVector  === 383 7 454  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  677  LoadVector  === 383 7 596  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  678  LoadVector  === 383 7 538  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  679  Replicate  === _ 387  [[ ]]  #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  680  AddVL  === _ 627 675  [[ ]]  #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  681  AddVL  === _ 680 677  [[ ]]  #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  682  AddVL  === _ 681 678  [[ ]]  #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  683  AddVL  === _ 682 676  [[ ]]  #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  684  AddReductionVL  === _ 345 683  [[ ]] no_strict_order

SuperWord::transform_loop: success
4258  109    b  4       Reduction2::test3 (24 bytes)

SuperWord::transform_loop:
    Loop: N666/N160  limit_check counted [int,int),+16 (10243 iters)  main has_sfpt strip_mined
 666  CountedLoop  === 666 176 160  [[ 666 172 669 678 ]] inner stride: 16 main of N666 strip mined !orig=[552],[461],[390],[177],[168],[116] !jvms: Reduction2::test3 @ bci:10 (line 32)
TraceNewVectors [AutoVectorization]:  779  LoadVector  === 342 7 453  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]:  780  LoadVector  === 342 7 642  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]:  781  LoadVector  === 342 7 533  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]:  782  LoadVector  === 342 7 636  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]:  783  LoadVector  === 342 7 537  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]:  784  LoadVector  === 342 7 656  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]:  785  LoadVector  === 342 7 387  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]:  786  LoadVector  === 342 7 650  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]:  787  Replicate  === _ 22  [[ ]]  #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  788  AddVL  === _ 669 782  [[ ]]  #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  789  AddVL  === _ 788 784  [[ ]]  #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  790  AddVL  === _ 789 786  [[ ]]  #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  791  AddVL  === _ 790 780  [[ ]]  #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  792  AddVL  === _ 791 781  [[ ]]  #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  793  AddVL  === _ 792 783  [[ ]]  #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  794  AddVL  === _ 793 779  [[ ]]  #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  795  AddVL  === _ 794 785  [[ ]]  #vectorx<J,2>
TraceNewVectors [AutoVectorization]:  796  AddReductionVL  === _ 301 795  [[ ]] no_strict_order

SuperWord::transform_loop: success

Comments
Let me know if you would be interested to work on this!
27-10-2025