Found during work on JDK-8340093
The test has a few cases where we do not vectorize because of long mul reduction / element-wise vectors.
test/hotspot/jtreg/compiler/loopopts/superword/TestReductions.java
But TestReductions.longMulSimple does vectorize, but it leads to performance regressions compared to non-vectorized code.
We already saw this here:
https://github.com/openjdk/jdk/pull/25387
(We got only 0.38 of the scalar performance)
The issue seems to be this:
- Matcher::match_rule_supported_vector
- has a comment that says that 64/128bit vector reductions for MulReductionVL is supported
- Matcher::match_rule_supported_auto_vectorization
- excludes MulVL from vectorization, because apparently no NEON implementation is available.
- Well: in the backend, we implement both MulVL and MulReductionVL, but we do it with a scalar implementation: pack and unpack.
- that is very inefficent, and can lead to slowdowns. I wonder if that also has an impact on the Vector API, probably yes.
We have multiple options here:
- We can just prevent long mul reductions for NEON completely
- But: in some odd cases it may be profitable. And for that, we could just adjust the cost model: make MulVL and MulReductionVL more expensive in the cost model. This is probably the preferrable method,