JDK-8318703 : C2 SuperWord: take reduction nodes into account in early unrolling analysis
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 22
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • Submitted: 2023-10-24
  • Updated: 2024-05-07
  • Resolved: 2024-05-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdResolved
Related Reports
Duplicate :  
Relates :  
Relates :  
Description
Early unrolling analysis (SuperWord::unrolling_analysis()) informs the loop unrolling factor computed by the unrolling policy (IdealLoopTree::policy_unroll()) via IdealLoopTree::policy_unroll_slp_analysis(). This analysis currently ignores reduction nodes [1]. For pure reduction loops (loops whose body consists only of reduction nodes), that leads the analysis to always request maximum unrolling (unroll factor of Matcher::superword_max_vector_size(T_BYTE) [2]) independently of the type of the reduction nodes.

The attached program reproduces this behavior (on JDK 22 b20), leading to a non-vectorized main loop with 512 floating point additions. Run with:

$ java -Xcomp -XX:CompileOnly=Test::test Test.java

[1] https://github.com/openjdk/jdk/blob/fd332da1c8a689e91b7124fc342f02b6e0d3dff5/src/hotspot/share/opto/superword.cpp#L192
[2] https://github.com/openjdk/jdk/blob/fd332da1c8a689e91b7124fc342f02b6e0d3dff5/src/hotspot/share/opto/superword.cpp#L185
Comments
Actually, the exact description of this RFE is accomplished with JDK-8324794. I'm closing this here as a duplicate. But of course further work for reductions have to be done, and for that we have JDK-8307516.
07-05-2024

The reason is that your Test.java does not have any load or store. And even if it did, we would currently not deem this loop profitable, see also JDK-8307516. Basically: the vector-float-addition-reduction actually is quite expensive inside a loop. It still needs to make all additions in the same order. But since it is now in a vector, there are additional shuffle operations on the critical path. Once I refactor SuperWord a little more, it may be possible to get a cost-model, and then we can decide how much the cost of a vector-int-reduction (not ordered) and a vector-float-reduction (ordered). Maybe, if there are lots of other vectorizable operations feeding into the reduction, the total loop turns out profitable for vectorization.
07-05-2024

[~rcastanedalo] I have had this item on my list for a while. I gave a quick analysis with the current JDK23 code. I ran this: ./java -Xcomp -XX:CompileOnly=Test::test -XX:+TraceLoopOpts -XX:+TraceSuperWordLoopUnrollAnalysis -XX:UseAVX=2 Test.java And got: slp analysis: set max unroll to 8 slp analysis unroll=208, default limit=60 And then I ran: slp analysis: set max unroll to 16 slp analysis unroll=416, default limit=60 Since one of my recent refactoring changes, I now do NOT ignore the reductions in the SuperWord::unrolling_analysis(). And you can see that the reduction has an effect on the unrolling, i.e. for AVX2 MaxVectorSize=32 we get 32/4 = 8 unrolling. And for AVX512 MaxVectorSize=64 we get 64 / 4 = 16 unrolling. Still, this code will currently not be vectorized. Then I ran this: ./java -Xcomp -XX:CompileOnly=Test::test -XX:+TraceLoopOpts -XX:+TraceSuperWordLoopUnrollAnalysis -XX:CompileCommand=TraceAutoVectorization,Test::test,ALL Test.java And I got: SuperWord::transform_loop: Loop: N663/N75 counted [int,113),+16 (2147483648 iters) main has_sfpt strip_mined 663 CountedLoop === 663 371 75 [[ 663 367 794 795 ]] inner stride: 16 main of N663 strip mined !orig=[574],[517],[471],[372],[333],[300] !jvms: Test::test @ bci:21 (line 7) find_adjacent_refs found 0 memops After Superword::find_adjacent_refs PairSet::print: 0 pairs No pair packs generated, abort SuperWord.
07-05-2024