| JDK 11 | JDK 17 | JDK 19 |
|---|---|---|
| 11.0.17-oracleFixed | 17.0.5-oracleFixed | 19 b21Fixed |
|
Relates :
|
|
|
Relates :
|
|
|
Relates :
|
|
|
Relates :
|
|
|
Relates :
|
|
|
Relates :
|
|
|
Relates :
|
|
|
Relates :
|
SLP wrongly vectorizes a loop as a reduction instead of a simple map pattern. SLP believes the loop forms a reduction pattern because its operations are earlier marked as reduction nodes (by PhaseIdealLoop::mark_reductions), however they are marked as such within a *different* loop that is removed by in-between loop transformations.
HOW TO REPRODUCE
$ java -ea Fail.java (using JDK 17, 18, or 19 up to b11)
FAILURE ANALYSIS
Using Fail.java as an example (run with -XX:-PartialPeelLoop for simplicity), the sequence of events is (roughly) as follows:
Original loop before loop optimizations (N, M, and Fail.mask are constants):
for (int i = 0; i < N; i++) {
for (j = 0; j < M; j++) {
r[i] ^= Fail.mask;
}
}
1. The inner loop is marked as a reduction together with its XOR operation:
for (int i = 0; i < N; i++) {
for (j = 0; j < M; j++) { // loop marked as a reduction
r[i] ^= Fail.mask; // XOR marked as a reduction
}
}
2. The inner loop is split into a peeled iteration, main, and post loop and unrolled twice:
for (int i = 0; i < N; i++) {
r[i] ^= Fail.mask; // XOR marked as a reduction (inconsistent, outer loop is not a reduction!)
int j = 0;
for (...; j+=2) { // loop marked as a reduction
r[i] ^= Fail.mask; // XOR marked as a reduction
r[i] ^= Fail.mask; // XOR marked as a reduction
}
for (...; j++) { // loop marked as a reduction
r[i] ^= Fail.mask; // XOR marked as a reduction
}
}
3. the inner main and post loops are found to be redundant (due to the "self-inversion" property of XOR with a constant operand) and get removed:
for (int i = 0; i < N; i++) {
r[i] ^= Fail.mask; // XOR marked as a reduction
}
4. the outer loop is further optimized into its final version, where the main loop is unrolled four times for SLP vectorization:
int i = 0;
for (...; i++) {
r[i] ^= Fail.mask; // XOR marked as a reduction
}
for (...; i+=4) {
r[i] ^= Fail.mask; // XOR marked as a reduction
r[i] ^= Fail.mask; // XOR marked as a reduction
r[i] ^= Fail.mask; // XOR marked as a reduction
r[i] ^= Fail.mask; // XOR marked as a reduction
}
for (...; i++) {
r[i] ^= Fail.mask; // XOR marked as a reduction
}
5. the main loop is wrongly vectorized as a reduction due to its XOR operations being marked as reductions:
int i = 0;
for (...; i++) {
r[i] ^= Fail.mask; // XOR marked as a reduction
}
for (...; i+=4) {
tmp = reduce(XOR, Fail.mask, r[i...i+3])
r[i...i+3] = [tmp, tmp, tmp, tmp]
}
for (...; i++) {
r[i] ^= Fail.mask; // XOR marked as a reduction
}
The expected main loop vectorization is:
...
for (...; i+=4) {
r[i...i+3] = map(XOR, r[i...i+3], [Fail.mask, Fail.mask, Fail.mask, Fail.mask])
}
...
Note that this failure is only reproducible in up to JDK 19 b11. In JDK 19 b12, JDK-8154302 introduces a safepoint poll in the (counted) outer-main loop (see step 4 above), which inhibits SLP vectorization ("SuperWord::transform_loop: loop too complicated, cl_exit->in(0) != lpt->_head"). The root cause of the failure (a reduction node within a non-reduction loop) remains present though.
ORIGINAL REPORT:
The attached fuzzer test produces a different result for C2 compared to C1/interpreter.
To reproduce (on JDK 17, JDK18, and JDK19):
$ java -Xint Test.java > int.log
$ java Test.java > c2.log
$ diff int.log c2.log
55c55
< iArr3 = -4168
---
> iArr3 = -204359
67c67
< iArr3 = -4168
---
> iArr3 = -195060
# To reproduce on JDK 17, JDK 18 (but not on JDK19 commit cc7cf81):
$ java -ea Reduced.java
(results in an exception because of an unexpected result.)
# To reproduce on JDK19 commit cc7cf81:
$ java -ea Reduced2.java
(as above, results in an exception because of an unexpected result.)
|