Bug ID: JDK-8300865 C2: product reduction in ProdRed

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 21
21 b25Fixed

In the runs of test compiler.loopopts.superword.ProdRed_Double with -XX:+SuperWordReductions and -XX:LoopMaxUnroll>=8 on x86_64, C2 is expected to vectorize the product reduction loop in prodReductionInit(), but it fails to do so for any run on an array of x86_64 CPUs with different vectorization capabilities.

HOW TO REPRODUCE

On a linux-x86_64-server-fastdebug build, run

$ make run-test TEST="compiler/loopopts/superword/ProdRed_Double.java" TEST_VM_OPTS="-XX:CompileCommand=PrintAssembly,compiler.loopopts.superword.ProdRed_Double::prodReductionImplement"
$ grep vector_reduction_double build/linux-x86_64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_loopopts_superword_ProdRed_Double_java/compiler/loopopts/superword/ProdRed_Double.jtr

We expect to find some matches of 'vector_reduction_double', but get none.

INITIAL ANALYSIS

SuperWord::construct_bb() relies on ReductionNode::implemented() to identify vectorizable reduction uses [1]. Among other arguments, ReductionNode::implemented() takes the minimum vector size for the reduction type (vlen), and fails trivially if it is less or equal than 1 [2]. This is always the case in the context of SuperWord::construct_bb(), since vlen is just set to the result of Matcher::min_vector_size(), which since JDK-8265783 always returns 1 for the 'double' type [3]. Reverting the changes made by JDK-8265783 to Matcher::min_vector_size (in x86.ad) re-enables vectorization of ProdRed_Double.

Thanks to Daniel Skantz for pointing out the issue, found while working on JDK-8294715.

[1] https://github.com/openjdk/jdk/blob/5a4945c0d95423d0ab07762c915e9cb4d3c66abb/src/hotspot/share/opto/superword.cpp#L3355
[2] https://github.com/openjdk/jdk/blob/5a4945c0d95423d0ab07762c915e9cb4d3c66abb/src/hotspot/share/opto/vectornode.cpp#L1468
[3] https://github.com/openjdk/jdk/blob/5a4945c0d95423d0ab07762c915e9cb4d3c66abb/src/hotspot/cpu/x86/x86.ad#L2293-L2295

Changeset: f9ad7df4 Author: Sandhya Viswanathan <sviswanathan@openjdk.org> Date: 2023-05-31 22:39:54 +0000 URL: https://git.openjdk.org/jdk/commit/f9ad7df4dafa0a2da38e8cbb4150049fb04f4327
31-05-2023
[~epeter] since you commented here please review PR and submit mach5 testing for it.
24-05-2023
A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/14065 Date: 2023-05-19 23:27:32 +0000
22-05-2023
I also just discovered this issue independently. I agree with the analysis above. Some other tests still do vectorize, but only if there is a Store that causes SuperWord::construct_bb() to return true. For example test/hotspot/jtreg/compiler/loopopts/superword/SumRed_Double.java for (int i = 0; i < a.length; i++) { d[i] = (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]); total += d[i]; } The store to "d[i]" causes vectorization. We also need a regression test for these things, best with IR framework. I am doing that with JDK-8302139.
16-02-2023
[~sviswanathan], could you please have a look? Thanks.
24-01-2023
ILW = Missed vectorization opportunity, rare?, no workaround = MLH = P4
23-01-2023

Relates :	JDK-8294715 - Add IR checks to the reduction vectorization tests
Relates :	JDK-8302139 - Speed up SuperWord reduction tests
Relates :	JDK-8265783 - Create a separate library for x86 Intel SVML assembly intrinsics