JDK-8301657 : Fix post loop vectorization with parallel loop iv
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version:
    9,10,11,12,13,14,15,16,17,18,19,20,21 9,10,11,12,13,14,15,16,17,18,19,20,21
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • OS: generic
  • CPU: generic
  • Submitted: 2023-02-02
  • Updated: 2023-07-17
  • Resolved: 2023-07-17
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdResolved
Related Reports
Duplicate :  
Relates :  
Description
Recently we find HotSpot jtreg test/hotspot/jtreg/compiler/vectorization/runner/LoopLiveOutNodesTest.java fails intermittently on Arm Neoverse-N2 (a kind of AArch64 CPU with 128-bit Scalable Vector Extension). After some investigation, we find it's caused by a bug in the experimental post loop vectorization with parallel loop induction variables. We create below test case to stably reproduce the issue.

public class Foo {
    private static final int SIZE = 500;

    private static int[] a = new int[SIZE];
    private static int[] b = new int[SIZE];

    private static int init = 33;
    private static int iters = 335;

    private static int testLoop(int start, int end) {
        int i = 0, j = 0;
        for (i = start; i < end; i++, j++) {
            b[i] = a[i];
        }
        return j;
    }

    public static void main(String[] args) {
        // Warmup
        for (int warmup = 0; warmup < 20000; warmup++) {
            testLoop(0, 500);
        }
        // Test
        int result = testLoop(init, init + iters);
        if (result != iters) {
            throw new RuntimeException("Incorrect result: expected = " +
                    iters + ", actual = " + result);
        }
        System.out.println("Passed");
    }
}

$ java Foo
Passed

$ java -XX:+UnlockExperimentalVMOptions -XX:+PostLoopMultiversioning Foo
Exception in thread "main" java.lang.RuntimeException: Incorrect result: expected = 335, actual = 328
	at Foo.main(Foo.java:26)

Comments
No need to fix this any more since the legacy code is removed.
17-07-2023

Some investigation: The loop in above method has 2 parallel induction variables, `i` and `j`. `PhaseIdealLoop::replace_parallel_iv()` converts secondary IV phi nodes into the tripcount IV phi +/- some loop-invariant amount. The post loop vectorization transforms the scalar post loop to vector masked. But secondary IV `j`, which is previously converted to another user of the trip count IV, does not get correctly incremented after loop.
07-02-2023

ILW = Wrong execution with post loop vectorization, only single test and with experimental feature, use -XX:-PostLoopMultiversioning = HLL = P4
02-02-2023

Only appears with experimental VM feature (very few people use it), so ILW = P4
02-02-2023