JDK-8340093 caused some performance regression in some long reduction microbenchmarks on SVE machines.
unit = ns/op
WI = with cost model
WO = without cost model
P0 = with cost model, but auto vectorization disabled, i.e. -XX:AutoVectorizationOverrideProfitability=0
128-bit sve machine:
Benchmark WI vs WO WI vs P0
VectorReduction2.NoSuperword.longAddDotProduc 23.21% 22.92%
VectorReduction2.NoSuperword.longMulDotProduct 18.25% 17.96%
VectorReduction2.NoSuperword.longMulSimple 21.11% 21.16%
VectorReduction2.WithSuperword.longAddDotProduct 22.92% 23.03%
VectorReduction2.WithSuperword.longMulDotProduct 18.23% 18.19%
VectorReduction2.WithSuperword.longMulSimple 21.74% 21.04%
256-bit sve machine:
Benchmark WI vs WO WI vs P0
VectorReduction2.WithSuperword.longMulDotProduct 39.32% 39.32%
VectorReduction2.WithSuperword.longMulSimple 23.88% 23.86%
VectorReduction2.NoSuperword.longMulDotProduct 39.33% 39.35%
VectorReduction2.NoSuperword.longMulSimple 23.87% 23.92%