JDK-8279622 : C2: miscompilation of map pattern as a vector reduction
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 11,17,18,19
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2022-01-07
  • Updated: 2022-07-25
  • Resolved: 2022-05-03
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 17 JDK 19
11.0.17-oracleFixed 17.0.5-oracleFixed 19 b21Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
SLP wrongly vectorizes a loop as a reduction instead of a simple map pattern. SLP believes the loop forms a reduction pattern because its operations are earlier marked as reduction nodes (by PhaseIdealLoop::mark_reductions), however they are marked as such within a *different* loop that is removed by in-between loop transformations.

HOW TO REPRODUCE

$ java -ea Fail.java (using JDK 17, 18, or 19 up to b11)

FAILURE ANALYSIS

Using Fail.java as an example (run with -XX:-PartialPeelLoop for simplicity), the sequence of events is (roughly) as follows:

Original loop before loop optimizations (N, M, and Fail.mask are constants):

   for (int i = 0; i < N; i++) {
     for (j = 0; j < M; j++) {
       r[i] ^= Fail.mask;
     }
   }

1. The inner loop is marked as a reduction together with its XOR operation:

   for (int i = 0; i < N; i++) {
     for (j = 0; j < M; j++) {   // loop marked as a reduction
       r[i] ^= Fail.mask;        // XOR marked as a reduction
     }
   }

2. The inner loop is split into a peeled iteration, main, and post loop and unrolled twice:

   for (int i = 0; i < N; i++) {
     r[i] ^= Fail.mask;          // XOR marked as a reduction (inconsistent, outer loop is not a reduction!)
     int j = 0;
     for (...; j+=2) {           // loop marked as a reduction
       r[i] ^= Fail.mask;        // XOR marked as a reduction
       r[i] ^= Fail.mask;        // XOR marked as a reduction
     }
     for (...; j++) {            // loop marked as a reduction
       r[i] ^= Fail.mask;        // XOR marked as a reduction
     }
   }

3. the inner main and post loops are found to be redundant (due to the "self-inversion" property of XOR with a constant operand) and get removed:

   for (int i = 0; i < N; i++) {
     r[i] ^= Fail.mask;          // XOR marked as a reduction
   }

4. the outer loop is further optimized into its final version, where the main loop is unrolled four times for SLP vectorization:

   int i = 0;
   for (...; i++) {
     r[i] ^= Fail.mask;          // XOR marked as a reduction
   }
   for (...; i+=4) {
     r[i] ^= Fail.mask;          // XOR marked as a reduction
     r[i] ^= Fail.mask;          // XOR marked as a reduction
     r[i] ^= Fail.mask;          // XOR marked as a reduction
     r[i] ^= Fail.mask;          // XOR marked as a reduction
   }
   for (...; i++) {
     r[i] ^= Fail.mask;          // XOR marked as a reduction
   }

5. the main loop is wrongly vectorized as a reduction due to its XOR operations being marked as reductions:

   int i = 0;
   for (...; i++) {
     r[i] ^= Fail.mask;          // XOR marked as a reduction
   }
   for (...; i+=4) {
     tmp = reduce(XOR, Fail.mask, r[i...i+3])
     r[i...i+3] = [tmp, tmp, tmp, tmp]
   }
   for (...; i++) {
     r[i] ^= Fail.mask;          // XOR marked as a reduction
   }

The expected main loop vectorization is:
   ...
   for (...; i+=4) {
     r[i...i+3] = map(XOR, r[i...i+3], [Fail.mask, Fail.mask, Fail.mask, Fail.mask])
   }
   ...

Note that this failure is only reproducible in up to JDK 19 b11. In JDK 19 b12, JDK-8154302 introduces a safepoint poll in the (counted) outer-main loop (see step 4 above), which inhibits SLP vectorization ("SuperWord::transform_loop: loop too complicated, cl_exit->in(0) != lpt->_head"). The root cause of the failure (a reduction node within a non-reduction loop) remains present though.

ORIGINAL REPORT:

The attached fuzzer test produces a different result for C2 compared to C1/interpreter.

To reproduce (on JDK 17, JDK18, and JDK19):
$ java -Xint Test.java > int.log
$ java Test.java > c2.log
$ diff int.log c2.log
55c55
< iArr3 = -4168
---
> iArr3 = -204359
67c67
< iArr3 = -4168
---
> iArr3 = -195060

# To reproduce on JDK 17, JDK 18 (but not on JDK19 commit cc7cf81):
$ java -ea Reduced.java
(results in an exception because of an unexpected result.)

# To reproduce on JDK19 commit cc7cf81:
$ java -ea Reduced2.java
(as above, results in an exception because of an unexpected result.)
Comments
Fix request [11u] I backport this for parity with 11.0.17-oracle. C2 fix with the typical risk we should take. Needs follow up JDK-8286177, the other related issues are not needed. Clean backport from 17. Test passes, but also without the fix. SAP nightly testing passed.
03-07-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk11u-dev/pull/1194 Date: 2022-07-01 09:55:51 +0000
01-07-2022

Fix request [17u] I backport this for parity with 17.0.5-oracle. A C2 fix with the typical risk we should take. Needs follow up 8286177, the other related issues are not needed. I had to resolve due to differing context. Test passes, but also without the fix. SAP nightly testing passed.
01-07-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk17u-dev/pull/494 Date: 2022-06-22 07:30:59 +0000
22-06-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk18u/pull/157 Date: 2022-06-21 07:23:37 +0000
21-06-2022

Changeset: 6fcd3222 Author: Roberto CastaƱeda Lozano <rcastanedalo@openjdk.org> Date: 2022-05-03 11:08:48 +0000 URL: https://git.openjdk.java.net/jdk/commit/6fcd322258e0cce3724a4a8dc18f7802018a7cc9
03-05-2022

Internal test failures on JDK 11 instrumented with the assertion proposed in the PR confirm that JDK 11 is affected by this issue.
03-05-2022

Adding JDK 11 as a (potentially) affected version, as the main transformations involved in the failure are included (superword vectorization of reductions, peel/main/post loop transformation), even thought I have not been able to construct a reproducer before JDK-8271272 (JDK 17 b34). As mentioned by [~chagedorn] above, JDK-8271272 itself is likely unrelated and only acts as an enabler of the specific transformation chain that leads to failure in the reproducer.
02-05-2022

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk/pull/8464 Date: 2022-04-29 08:02:07 +0000
29-04-2022

ILW = Wrong result with vector instructions in OSR compiled method, only single Java Fuzzer test, use -XX:-UseSuperWord = HLM = P3
07-01-2022

Starts to fail after JDK-8271272 but that only seems to reveal an existing issue.
07-01-2022