JDK 11 | JDK 17 | JDK 19 |
---|---|---|
11.0.17-oracleFixed | 17.0.5-oracleFixed | 19 b21Fixed |
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
SLP wrongly vectorizes a loop as a reduction instead of a simple map pattern. SLP believes the loop forms a reduction pattern because its operations are earlier marked as reduction nodes (by PhaseIdealLoop::mark_reductions), however they are marked as such within a *different* loop that is removed by in-between loop transformations. HOW TO REPRODUCE $ java -ea Fail.java (using JDK 17, 18, or 19 up to b11) FAILURE ANALYSIS Using Fail.java as an example (run with -XX:-PartialPeelLoop for simplicity), the sequence of events is (roughly) as follows: Original loop before loop optimizations (N, M, and Fail.mask are constants): for (int i = 0; i < N; i++) { for (j = 0; j < M; j++) { r[i] ^= Fail.mask; } } 1. The inner loop is marked as a reduction together with its XOR operation: for (int i = 0; i < N; i++) { for (j = 0; j < M; j++) { // loop marked as a reduction r[i] ^= Fail.mask; // XOR marked as a reduction } } 2. The inner loop is split into a peeled iteration, main, and post loop and unrolled twice: for (int i = 0; i < N; i++) { r[i] ^= Fail.mask; // XOR marked as a reduction (inconsistent, outer loop is not a reduction!) int j = 0; for (...; j+=2) { // loop marked as a reduction r[i] ^= Fail.mask; // XOR marked as a reduction r[i] ^= Fail.mask; // XOR marked as a reduction } for (...; j++) { // loop marked as a reduction r[i] ^= Fail.mask; // XOR marked as a reduction } } 3. the inner main and post loops are found to be redundant (due to the "self-inversion" property of XOR with a constant operand) and get removed: for (int i = 0; i < N; i++) { r[i] ^= Fail.mask; // XOR marked as a reduction } 4. the outer loop is further optimized into its final version, where the main loop is unrolled four times for SLP vectorization: int i = 0; for (...; i++) { r[i] ^= Fail.mask; // XOR marked as a reduction } for (...; i+=4) { r[i] ^= Fail.mask; // XOR marked as a reduction r[i] ^= Fail.mask; // XOR marked as a reduction r[i] ^= Fail.mask; // XOR marked as a reduction r[i] ^= Fail.mask; // XOR marked as a reduction } for (...; i++) { r[i] ^= Fail.mask; // XOR marked as a reduction } 5. the main loop is wrongly vectorized as a reduction due to its XOR operations being marked as reductions: int i = 0; for (...; i++) { r[i] ^= Fail.mask; // XOR marked as a reduction } for (...; i+=4) { tmp = reduce(XOR, Fail.mask, r[i...i+3]) r[i...i+3] = [tmp, tmp, tmp, tmp] } for (...; i++) { r[i] ^= Fail.mask; // XOR marked as a reduction } The expected main loop vectorization is: ... for (...; i+=4) { r[i...i+3] = map(XOR, r[i...i+3], [Fail.mask, Fail.mask, Fail.mask, Fail.mask]) } ... Note that this failure is only reproducible in up to JDK 19 b11. In JDK 19 b12, JDK-8154302 introduces a safepoint poll in the (counted) outer-main loop (see step 4 above), which inhibits SLP vectorization ("SuperWord::transform_loop: loop too complicated, cl_exit->in(0) != lpt->_head"). The root cause of the failure (a reduction node within a non-reduction loop) remains present though. ORIGINAL REPORT: The attached fuzzer test produces a different result for C2 compared to C1/interpreter. To reproduce (on JDK 17, JDK18, and JDK19): $ java -Xint Test.java > int.log $ java Test.java > c2.log $ diff int.log c2.log 55c55 < iArr3 = -4168 --- > iArr3 = -204359 67c67 < iArr3 = -4168 --- > iArr3 = -195060 # To reproduce on JDK 17, JDK 18 (but not on JDK19 commit cc7cf81): $ java -ea Reduced.java (results in an exception because of an unexpected result.) # To reproduce on JDK19 commit cc7cf81: $ java -ea Reduced2.java (as above, results in an exception because of an unexpected result.)
|