found during JDK-8340093, and specifically its test TestReductions.longMulBig, and the related benchmark.
We already saw that it was not vectorizing here:
https://github.com/openjdk/jdk/pull/25387
See also the attached test.
It seems that -XX:LoopUnrollLimit=1000 helps. Maybe the loop is too large? But there are other tests in TestReductions that seem to have similar many instructions... so not sure.
./java -Xbatch -XX:CompileCommand=compileonly,Reduction3::test* -XX:CompileCommand=printcompilation,Reduction3::test* -XX:+TraceNewVectors -XX:UseAVX=3 -XX:CompileCommand=TraceAutoVectorization,Reduction3::test*,SW_INFO -XX:+TraceLoopOpts -XX:+TraceSuperWordLoopUnrollAnalysis -XX:LoopUnrollLimit=1000 Reduction3.java
Without the flag, it compiles only during OSR, but not regular compilation.
With the flag, it compiles for OSR and regular compilation.