(Summary is provisional until the reason is found)
Take this benchmark:
package org.openjdk;
import org.openjdk.jmh.annotations.*;
import java.util.concurrent.TimeUnit;
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(value = 3, jvmArgsAppend = {"-Xms2g", "-Xmx2g", "-XX:+UseParallelGC"})
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class ConcatSO {
@Benchmark
public String test() {
String s1 = "STRING ONE";
String s2 = "STRING TWO";
return "abc " + s1 + " def " + s2;
}
}
...and run it with different JDKs and compilation targets:
Benchmark Mode Cnt Score Error Units
# target=8, 8u191
ConcatSO.test avgt 15 14.071 ± 0.112 ns/op
# target=8, 11.0.2
ConcatSO.test avgt 15 12.438 ± 0.114 ns/op
# target=9, 9 GA
ConcatSO.test avgt 15 12.681 ± 0.135 ns/op
# target=9, 11.0.2
ConcatSO.test avgt 15 14.211 ± 0.086 ns/op ; <---- !!!
# target=11, 11.0.2
ConcatSO.test avgt 15 14.169 ± 0.069 ns/op
# target=11, 11.0.2, -Djava.lang.invoke.stringConcat=BC_SB
ConcatSO.test avgt 15 12.477 ± 0.077 ns/op
Looks at "!!!" -- something had regressed in runtime parts to make it slower with 11u compared to 9, even with the same bytecode. Probably something in java.lang.invoke changed, seeing how BC_SB strategy recovers the performance.