Bug ID: JDK-8307516 C2 SuperWord: reconsider Reduction heuristic for UnorderedReduction

Type: Enhancement
Component: hotspot
Sub-Component: compiler
Affected Version: 21

Priority: P4
Status: Open
Resolution: Unresolved

Submitted: 2023-05-05
Updated: 2025-05-23

Other
tbdUnresolved

As discussed in the recent reduction speedup work:
https://github.com/openjdk/jdk/pull/13056 JDK-8302652

Can probably be fixed with JDK-8340093 (Cost-Model)
07-01-2025
Hi [~jcao]! Thanks for the interest in this subject. Yes, IF the reduction can be moved out of the loop, then vectorization would always seem to be profitable. Have you done the same benchmark with float / integer? There are a few reasons for a cost-model: 1) Not all reductions can be moved out of the loop. 1a) float/doulbe reductions require to be reduced in the same order as specified in the java program, otherwise rounding would produce different results (VectorAPI simply defines reductions as unordered and can ignore this issue). 1b) There are some examples I have where the int reductions cannot be moved out of the loop either - though these are unlikely to ever be in any real java code - the examples are very contrived edge cases. 2) Currently, we do not make good use of pack / extract / shuffle operations. If we want to introduce them, then we need some way of judging if the overhead of these additional instructions is worth it. 3) We may want to have multiple vectorization strategies, and then pick the most profitable of all of them. 4) If I get the chance to introduce If-conversion, then profitability is even more difficult to decide without a cost-model. Feel free to ask more, I'm planning to much more work around the C2 auto-vectorizer in the foreseeable future.
21-11-2023
Hi Emanual. I think simple reductions should be profitable. My understanding is that after https://bugs.openjdk.org/browse/JDK-8302652, reductions in loops are always profitable if we can hoist them to after the loop. The only case I'm aware where we cannot legally move out the reduction is when we use the accumulator inside the loop. It should be possible to check this inside `Superword::profitable`. Would we still want a cost model if this is the case? After this change: ``` diff --git a/src/hotspot/share/opto/superword.cpp b/src/hotspot/share/opto/superword.cpp index dd3f1d10dee..781458f4fd9 100644 --- a/src/hotspot/share/opto/superword.cpp +++ b/src/hotspot/share/opto/superword.cpp @@ -1987,7 +1987,7 @@ bool SuperWord::profitable(Node_List* p) { if (is_marked_reduction(p0)) { Node* second_in = p0->in(2); Node_List* second_pk = my_pack(second_in); - if ((second_pk == nullptr) \|\| (_num_work_vecs == _num_reductions)) { + if (second_pk == nullptr) { // Unmark reduction if no parent pack or if not enough work // to cover reduction expansion overhead _loop_reductions.remove(p0->_idx); ``` And running this simple case ``` class Reduction { final static int SIZE = 1000000; int reduction(int[] a) { int s = 0; for (int i = 0; i < SIZE; ++i) { s += a[i]; } return s; } public static void main(String[] args) { Reduction reduction = new Reduction(); int[] arr = new int[SIZE]; for (int i = 0; i < 20000; ++i) { reduction.reduction(arr); } } } ``` The runtime improves from 8.2 seconds to 3.3 seconds on my x86 Linux machine.
21-11-2023
With the re-analysis of JDK-8260943, it is becoming clear that we need a cost model. With this task here, we can introduce a cost model, build the packset optimistically, and then check with the cost-model if we want to apply the packset. Later, we can also use this to unify the CompileCommand Option Vectorize (_do_vector_loop), such that we try both approaches for the "seeding". More info in the PR of 8260943: https://github.com/openjdk/jdk/pull/13930
17-05-2023
This should be related to JDK-8188313 (~5.5 years ago, someone had proposed to consider re-enabling C2's vectorization of simple reductions).
08-05-2023

Duplicate :	JDK-8323609 - C2: Odd vectorization breakage with DBB.getLong loop
Relates :	JDK-8188313 - C2: Consider enabling auto-vectorization for simple reductions (disabled by JDK-8078563)
Relates :	JDK-8340093 - C2 SuperWord: implement cost model
Relates :	JDK-8318703 - C2 SuperWord: take reduction nodes into account in early unrolling analysis
Relates :	JDK-8357530 - C2 SuperWord: Diagnostic flag AutoVectorizationOverrideProfitability
Relates :	JDK-8302652 - [SuperWord] Reduction should happen after loop, when possible
Relates :	JDK-8336000 - C2 SuperWord: report that 2-element reductions do not vectorize