JDK-8298935 : fix independence bug in create_pack logic in SuperWord::find_adjacent_refs
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 8,11,17,18,19,20,21
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2022-12-16
  • Updated: 2023-07-24
  • Resolved: 2023-03-15
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 21
21 b14Fixed
Related Reports
Blocks :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
Please change the generic bug title to something more descriptive once the root cause is known. 

The attached Java Fuzzer test results in a wrong execution with C2 compared to a run with the interpreter/C1:

To reproduce:
$ java -Xint Test.java > interpreter.log
$ java -XX:-TieredCompilation -Xcomp -XX:CompileOnly=Test Test.java > c2.log
$ diff interpreter.log c2.log

$ java -Xint Reduced.java > interpreter.log
$ java -XX:-TieredCompilation -Xcomp -XX:CompileOnly=Reduced Reduced.java > c2.log
$ diff interpreter.log c2.log

Output of Test.java diff:
0a1
> CompileCommand: compileonly Test.* bool compileonly = true
10c11
< vMeth_check_sum: 2660033801921391608
---
> vMeth_check_sum: 2660033801921192200
21c22
< vMeth_check_sum: 5320067603842783216
---
> vMeth_check_sum: 5320067603842332488
32c33
< vMeth_check_sum: 7980101405764174824
---
> vMeth_check_sum: 7980101405763472776
43c44
< vMeth_check_sum: -7806608866023985184
---
> vMeth_check_sum: -7806608866024938552
54c55
< vMeth_check_sum: -5146575064102593576
---
> vMeth_check_sum: -5146575064103798264
65c66
< vMeth_check_sum: -2486541262181201968
---
> vMeth_check_sum: -2486541262182657976
76c77
< vMeth_check_sum: 173492539740189640
---
> vMeth_check_sum: 173492539738482312
87c88
< vMeth_check_sum: 2833526341661581248
---
> vMeth_check_sum: 2833526341659622600
98c99
< vMeth_check_sum: 5493560143582972856
---
> vMeth_check_sum: 5493560143580771093
109c110
< vMeth_check_sum: 8153593945504364464
---
> vMeth_check_sum: 8153593945501923741

Comments
Changeset: 01e69205 Author: Emanuel Peter <epeter@openjdk.org> Date: 2023-03-15 14:02:45 +0000 URL: https://git.openjdk.org/jdk/commit/01e6920581407bc3bd69db495fc694629ef01262
15-03-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/12350 Date: 2023-01-31 18:26:52 +0000
09-02-2023

I summarize what I have found so far. As far as I understand, at the time when we do SuperWord::schedule, there should be no cyclic dependencies between the packs (except for reduction packs, they have self-cycles). I added some verification code between SuperWord::filter_packs and SuperWord::schedule, that builds a pack-dependency-graph, and detects cycles (see verification_patch.diff). It seems to pass for the tests we have from tier1-3, but triggers for Reduced2.java. ./java -XX:-TieredCompilation -Xcomp -XX:CompileOnly=Reduced2::test -XX:CompileCommand=printcompilation,Reduced2::* -XX:+PrintInlining -XX:+TraceSuperWord -XX:LoopMaxUnroll=5 Reduced2.java (LoopMaxUnroll=5 is not necessary, but it makes the example a bitter smaller) (see also Reduced2.0_before.png and Reduced2.1_after.png to see the graph before and after SuperWord) When one turns on -XX:+TraceSuperWord, one can see how the packset develops. This is the function that creates issues: static void test() { for (int i = 4; i < 100; i++) { int v = iArr[i]; iArr[i + 2] = v; // write 2 ahead (relative offset) fArr[i] = v; // seems required, but leads to wrong result } } Basically, we take values from the int-array, and write them back with a relative offset of 2, and also write them to the float-array. With an unrolling factor of 4, we see that we read the values from the int-array of positions [i, i+1, i+2, i+3] and write to positions [i+2, i+3, i+4, i+5] in the int-array. But when we sequentially execute the function, we see that when we read position i+2, it has already been overwritten when we read (i) and wrote it into (i+2). So there is a cyclic dependency in the LoadI and StoreI pack. However, we don't detect it because the relative offset is 2 (also works for relative offset of 3, and if we let it unroll to 16x, any relative offset from 2..15 works). Sadly, we don't seem to pick up on that cyclic dependency. SuperWord::schedule assumes there is no cyclic dependency, and schedules all LoadI before the StoreI, leading to wrong results. Analysis: After SuperWord::find_adjacent_refs we find the pack-pairs for LoadI, StoreI and StoreF. During SuperWord::extend_packlist we also get the ConvI2F pack-pairs. In SuperWord::combine_packs, we combine the pairs to vectors of length 4 (one per StoreF, StoreI, LoadI, ConvI2F). This seems to be a problematic step - before the pair-packs did not have any aliasing issues, as long as we had a relative offset of at least 2, pairs cannot detect that. SuperWord::filter_packs does not remove any of the packs. Modified test 1: If I use relative offset 1, then the pair-packs detect that they are dependent on their peer in the pair -> pair is not formed. Should we not do this check when we combine the pairs? Modified test 2 If I modify the function, and turn the float-array into a second int-array, then I can see that the StoreI for the first int-array does not generate the necessary 3 pack-pairs, but only 2 (the middle one is missing). That way, it is not made into a length 4 vector, and filtered out. That leads to everything else being filtered out - vectorisation fails as it should. Current working hypothesis: We need to check independence of the packs we combine in SuperWord::combine_packs. I have a verification-patch that checks this, and am now running testing to see if it passes on all other tests.
26-01-2023

It looks like the pack algo created cyclic dependencies. I am working on a verifier to catch cycles.
24-01-2023

Thanks [~chagedorn] for reducing the test! ./java -Xint Reduced.java > interpreter.log ./java -XX:-TieredCompilation -Xcomp -XX:CompileOnly=Reduced Reduced.java > c2.log diff interpreter.log c2.log 1c1,2 < x -32958 --- > CompileCommand: compileonly Reduced.* bool compileonly = true > x -37468
23-01-2023

I reduced it a bit further down, and restricted comilation: ./java -Xint Reduced2.java "sum 0" ./java -XX:-TieredCompilation -Xcomp -XX:CompileOnly=Reduced2::test -XX:CompileCommand=printcompilation,Reduced2::* -XX:+PrintInlining Reduced2.java "sum 60"
23-01-2023

ILW = Wrong execution of C2 compiled code, single Java Fuzzer testcase, use -XX:-UseSuperWord or disable compilation of affected method = HLM = P3
16-12-2022