JDK-8370671 : C2 SuperWord [x86]: implement Long.max/min reduction for AVX2
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 26
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • CPU: x86
  • Submitted: 2025-10-27
  • Updated: 2025-10-27
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Relates :  
Description
I noticed that we have the MaxV and MinV implemented for long, but not the reduction. I think it should be possible to allow the reduction.

It already works for AVX512.

I found this during work on JDK-8340093, where it would now be considered profitable to vectorize the long min/max reduction.
See tests in:
test/hotspot/jtreg/compiler/loopopts/superword/TestReductions.java

I attached a Reduction.java for demonstration.

One can see that the element-wise MaxV is vectorized, but not the reduction. But an add-reduction is vectorized, so all the shuffling should be available. That indicates to me that we should be able to do a MaxV reduction.

Investigate if we have the same issue with the Vector API.

java -Xbatch -XX:CompileCommand=compileonly,Reduction::test* -XX:CompileCommand=printcompilation,Reduction::test* -XX:+TraceNewVectors -XX:UseAVX=2 -XX:CompileCommand=TraceAutoVectorization,Reduction::test*,SW_REJECTIONS Reduction.java

[empeter@emanuel bin]$ ./java -Xbatch -XX:CompileCommand=compileonly,Reduction::test* -XX:CompileCommand=printcompilation,Reduction::test* -XX:+TraceNewVectors -XX:UseAVX=2 -XX:CompileCommand=TraceAutoVectorization,Reduction::test*,SW_REJECTIONS Reduction.java
CompileCommand: compileonly Reduction.test* bool compileonly = true
CompileCommand: PrintCompilation Reduction.test* bool PrintCompilation = true
CompileCommand: TraceAutoVectorization Reduction.test* const char* TraceAutoVectorization = 'SW_REJECTIONS'
4018   97 %  b  3       Reduction::test1 @ 4 (26 bytes)
4020   98    b  3       Reduction::test1 (26 bytes)
4021   99 %  b  4       Reduction::test1 @ 4 (26 bytes)

SuperWord::transform_loop:
    Loop: N562/N162  limit_check counted [int,int),+4 (10243 iters)  main multiversion_fast has_sfpt strip_mined
 562  CountedLoop  === 562 275 162  [[ 557 561 562 271 565 566 476 236 ]] inner stride: 4 main of N562 strip mined multiversion_fast !orig=[473],[276],[245],[223] !jvms: Reduction::test1 @ bci:13 (line 18)

WARNING: Removed pack: not implemented at any smaller size:
    0:  547  MaxL  === _ 565 548  [[ 544 ]]  !orig=463,225,199 !jvms: Long::max @ bci:2 (line 1984) Reduction::test1 @ bci:14 (line 18)
    1:  544  MaxL  === _ 547 545  [[ 463 ]]  !orig=225,199 !jvms: Long::max @ bci:2 (line 1984) Reduction::test1 @ bci:14 (line 18)
    2:  463  MaxL  === _ 544 464  [[ 225 ]]  !orig=225,199 !jvms: Long::max @ bci:2 (line 1984) Reduction::test1 @ bci:14 (line 18)
    3:  225  MaxL  === _ 463 226  [[ 277 565 383 ]]  !orig=199 !jvms: Long::max @ bci:2 (line 1984) Reduction::test1 @ bci:14 (line 18)

WARNING: Removed pack: not profitable:
    0:  548  LoadL  === 394 7 549  [[ 547 ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; #long !orig=464,226,188 !jvms: Reduction::test1 @ bci:13 (line 18)
    1:  545  LoadL  === 394 7 546  [[ 544 ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; #long !orig=226,188 !jvms: Reduction::test1 @ bci:13 (line 18)
    2:  464  LoadL  === 394 7 465  [[ 463 ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; #long !orig=226,188 !jvms: Reduction::test1 @ bci:13 (line 18)
    3:  226  LoadL  === 394 7 227  [[ 225 ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; #long !orig=188 !jvms: Reduction::test1 @ bci:13 (line 18)

SuperWord::transform_loop failed: SuperWord::SLP_extract did not vectorize
4034  100    b  4       Reduction::test1 (26 bytes)

SuperWord::transform_loop:
    Loop: N471/N170  limit_check counted [int,int),+4 (10243 iters)  main has_sfpt strip_mined
 471  CountedLoop  === 471 186 170  [[ 471 182 474 477 ]] inner stride: 4 main of N471 strip mined !orig=[400],[187],[178],[116] !jvms: Reduction::test1 @ bci:10 (line 18)

WARNING: Removed pack: not implemented at any smaller size:
    0:  461  MaxL  === _ 474 462  [[ 460 ]]  !orig=395,157,411 !jvms: Long::max @ bci:2 (line 1984) Reduction::test1 @ bci:14 (line 18)
    1:  460  MaxL  === _ 461 464  [[ 395 ]]  !orig=157,411 !jvms: Long::max @ bci:2 (line 1984) Reduction::test1 @ bci:14 (line 18)
    2:  395  MaxL  === _ 460 396  [[ 157 ]]  !orig=157,411 !jvms: Long::max @ bci:2 (line 1984) Reduction::test1 @ bci:14 (line 18)
    3:  157  MaxL  === _ 395 225  [[ 474 188 332 ]]  !orig=411 !jvms: Long::max @ bci:2 (line 1984) Reduction::test1 @ bci:14 (line 18)

WARNING: Removed pack: not profitable:
    0:  462  LoadL  === 352 7 463  [[ 461 ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=5; #long (does not depend only on test, unknown control) !orig=396,225,[146] !jvms: Reduction::test1 @ bci:13 (line 18)
    1:  464  LoadL  === 352 7 465  [[ 460 ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=5; #long (does not depend only on test, unknown control) !orig=225,[146] !jvms: Reduction::test1 @ bci:13 (line 18)
    2:  396  LoadL  === 352 7 397  [[ 395 ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=5; #long (does not depend only on test, unknown control) !orig=225,[146] !jvms: Reduction::test1 @ bci:13 (line 18)
    3:  225  LoadL  === 352 7 144  [[ 157 ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=5; #long (does not depend only on test, unknown control) !orig=[146] !jvms: Reduction::test1 @ bci:13 (line 18)

SuperWord::transform_loop failed: SuperWord::SLP_extract did not vectorize
4153  101 %  b  3       Reduction::test2 @ 2 (25 bytes)
4154  102    b  3       Reduction::test2 (25 bytes)
4155  103 %  b  4       Reduction::test2 @ 2 (25 bytes)

SuperWord::transform_loop:
    Loop: N591/N152  limit_check counted [int,int),+4 (10243 iters)  main multiversion_fast has_sfpt strip_mined
 591  CountedLoop  === 591 295 152  [[ 571 574 585 590 591 291 594 595 489 503 255 242 ]] inner stride: 4 main of N591 strip mined multiversion_fast !orig=[500],[296],[263],[239] !jvms: Reduction::test2 @ bci:12 (line 25)
TraceNewVectors [AutoVectorization]:  639  Replicate  === _ 23  [[ ]]  #vectory<J,4>
TraceNewVectors [AutoVectorization]:  640  LoadVector  === 421 595 577  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectory<J,4>
TraceNewVectors [AutoVectorization]:  641  MaxV  === _ 640 639  [[ ]]  #vectory<J,4>
TraceNewVectors [AutoVectorization]:  642  StoreVector  === 591 595 577 641  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched  Memory: @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6;

SuperWord::transform_loop: success
4173  104    b  4       Reduction::test2 (25 bytes)

SuperWord::transform_loop:
    Loop: N505/N188  limit_check counted [int,int),+4 (10243 iters)  main has_sfpt strip_mined
 505  CountedLoop  === 505 213 188  [[ 492 495 505 508 509 426 209 175 ]] inner stride: 4 main of N505 strip mined !orig=[432],[214],[205],[113] !jvms: Reduction::test2 @ bci:8 (line 25)
TraceNewVectors [AutoVectorization]:  574  Replicate  === _ 143  [[ ]]  #vectory<J,4>
TraceNewVectors [AutoVectorization]:  575  LoadVector  === 383 509 498  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectory<J,4> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]:  576  MaxV  === _ 575 574  [[ ]]  #vectory<J,4>
TraceNewVectors [AutoVectorization]:  577  StoreVector  === 505 509 498 576  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched  Memory: @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5;

SuperWord::transform_loop: success
4234  105 %  b  3       Reduction::test3 @ 4 (24 bytes)
4235  106    b  3       Reduction::test3 (24 bytes)
4237  107 %  b  4       Reduction::test3 @ 4 (24 bytes)

SuperWord::transform_loop:
    Loop: N551/N162  limit_check counted [int,int),+4 (10243 iters)  main multiversion_fast has_sfpt strip_mined
 551  CountedLoop  === 551 264 162  [[ 546 550 551 260 554 555 465 225 ]] inner stride: 4 main of N551 strip mined multiversion_fast !orig=[462],[265],[234],[212] !jvms: Reduction::test3 @ bci:13 (line 32)
TraceNewVectors [AutoVectorization]:  599  LoadVector  === 383 7 538  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectory<J,4>
TraceNewVectors [AutoVectorization]:  600  Replicate  === _ 387  [[ ]]  #vectory<J,4>
TraceNewVectors [AutoVectorization]:  601  AddVL  === _ 554 599  [[ ]]  #vectory<J,4>
TraceNewVectors [AutoVectorization]:  602  AddReductionVL  === _ 345 601  [[ ]] no_strict_order

SuperWord::transform_loop: success
4260  108    b  4       Reduction::test3 (24 bytes)

SuperWord::transform_loop:
    Loop: N461/N160  limit_check counted [int,int),+4 (10243 iters)  main has_sfpt strip_mined
 461  CountedLoop  === 461 176 160  [[ 461 172 464 467 ]] inner stride: 4 main of N461 strip mined !orig=[390],[177],[168],[116] !jvms: Reduction::test3 @ bci:10 (line 32)
TraceNewVectors [AutoVectorization]:  531  LoadVector  === 342 7 453  [[ ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectory<J,4> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]:  532  Replicate  === _ 22  [[ ]]  #vectory<J,4>
TraceNewVectors [AutoVectorization]:  533  AddVL  === _ 464 531  [[ ]]  #vectory<J,4>
TraceNewVectors [AutoVectorization]:  534  AddReductionVL  === _ 301 533  [[ ]] no_strict_order

SuperWord::transform_loop: success
Comments
Feel free to contact me if you are interested in taking this task on.
27-10-2025