It was reported that after method with loop is inlined the loop is not vectorized (not even converted to Counted loop):
I am encountering a performance issue caused by the interaction between
method inlining and automatic vectorization.
Our application aggregates arrays intensively using a method named
ArrayFloatToArrayFloatVectorBinding.plus() with the following code:
for (int i = 0; i < srcLen; ++i) {
dstArray[i] += srcArray[i];
}
When we microbenchmark this method we observe fast performance close to the practical memory bandwidth and when we print the assembly code we observe loop unrolling and automatic vectorization with SIMD instructions.
In the real application, this method is actually inlined in a higher level
method named AVector.plus(). Unfortunately, the inlined version of the
aggregation code is not vectorized anymore.
This causes a significant performance drop, compared to a run where we explicitly disable the inlining and observe automatically vectorized code
again (-XX:CompileCommand=dontinline,com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding.plus).