C1 has performance regression for the following code after jdk15. C2 doesn't have this issue.
Pattern p = Pattern.compile("[A-Za-z0-9]+");
p.matcher(str)
Here is the test of our attachment SlowStartupTest. we can observe slowdown for tier1~3.
JDK 11 (AdoptOpenJDK build 11.0.7+10)
-XInt : Executed 10000 iterations in 85ms
-XX:TieredStopAtLevel=1: Executed 10000 iterations in 2ms
-XX:TieredStopAtLevel=2: Executed 10000 iterations in 3ms
-XX:TieredStopAtLevel=3: Executed 10000 iterations in 4ms
-XX:TieredStopAtLevel=4: Executed 10000 iterations in 1ms
JDK 14 (AdoptOpenJDK build 14.0.2+12)
-XInt : Executed 10000 iterations in 76ms
-XX:TieredStopAtLevel=1: Executed 10000 iterations in 2ms
-XX:TieredStopAtLevel=2: Executed 10000 iterations in 2ms
-XX:TieredStopAtLevel=3: Executed 10000 iterations in 4ms
-XX:TieredStopAtLevel=4: Executed 10000 iterations in 1ms
JDK 15 (AdoptOpenJDK build 15.0.2+7, similar also on build 15+36)
-XInt : Executed 10000 iterations in 54ms
-XX:TieredStopAtLevel=1: Executed 10000 iterations in 170ms ??
-XX:TieredStopAtLevel=2: Executed 10000 iterations in 178ms ??
-XX:TieredStopAtLevel=3: Executed 10000 iterations in 174ms ??
-XX:TieredStopAtLevel=4: Executed 10000 iterations in 1ms
JDK 16 (AdoptOpenJDK-16.0.1+9)
-XInt : Executed 10000 iterations in 54ms
-XX:TieredStopAtLevel=1: Executed 10000 iterations in 160ms ??
-XX:TieredStopAtLevel=2: Executed 10000 iterations in 160ms ??
-XX:TieredStopAtLevel=3: Executed 10000 iterations in 160ms ??
-XX:TieredStopAtLevel=4: Executed 10000 iterations in 1ms
JDK 17 (Temurin-17+35)
-XInt : Executed 10000 iterations in 56ms
-XX:TieredStopAtLevel=1: Executed 10000 iterations in 162ms ??
-XX:TieredStopAtLevel=2: Executed 10000 iterations in 165ms ??
-XX:TieredStopAtLevel=3: Executed 10000 iterations in 177ms ??
-XX:TieredStopAtLevel=4: Executed 10000 iterations in 1ms
We did binary search on revisions, then we attribute this performance regression to JDK-8238358.
We also tested on x86_64 and aarch64. both suffer from this issue.
Profile it with async-profiler and you'll see huge amounts of time in
`SharedRuntime::handle_wrong_method_ic_miss` and `SharedRuntime::resolve_virtual_call_C` for `java.util.regex.Pattern$BmpCharPredicate$$Lambda$22.0x80000002b.is(int)`.
[update 11/23/2021]
The performance issue identified by the regex example(SlowStartupTest.java) has been resolved by JDK-8276216.
C1 still generates less efficient code for the invocation of private interface methods, manifested by InvokePrivateInterfaceMethod.java.