JDK-8274983 : C1 optimizes the invocation of private interface methods
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 15,16,17,18
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2021-10-08
  • Updated: 2022-03-03
  • Resolved: 2021-11-30
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17 JDK 18
17.0.4-oracleFixed 18 b26Fixed
Related Reports
Relates :  
Description
C1 has performance regression for the following code after jdk15. C2 doesn't have this issue. 

Pattern p = Pattern.compile("[A-Za-z0-9]+");
p.matcher(str)

Here is the test of our attachment SlowStartupTest.  we can observe slowdown for tier1~3. 

JDK 11 (AdoptOpenJDK build 11.0.7+10)
-XInt                  : Executed 10000 iterations in 85ms
-XX:TieredStopAtLevel=1: Executed 10000 iterations in 2ms
-XX:TieredStopAtLevel=2: Executed 10000 iterations in 3ms
-XX:TieredStopAtLevel=3: Executed 10000 iterations in 4ms
-XX:TieredStopAtLevel=4: Executed 10000 iterations in 1ms
JDK 14 (AdoptOpenJDK build 14.0.2+12)
-XInt                  : Executed 10000 iterations in 76ms
-XX:TieredStopAtLevel=1: Executed 10000 iterations in 2ms
-XX:TieredStopAtLevel=2: Executed 10000 iterations in 2ms
-XX:TieredStopAtLevel=3: Executed 10000 iterations in 4ms
-XX:TieredStopAtLevel=4: Executed 10000 iterations in 1ms
JDK 15 (AdoptOpenJDK build 15.0.2+7, similar also on build 15+36)
-XInt                  : Executed 10000 iterations in 54ms
-XX:TieredStopAtLevel=1: Executed 10000 iterations in 170ms ??
-XX:TieredStopAtLevel=2: Executed 10000 iterations in 178ms ??
-XX:TieredStopAtLevel=3: Executed 10000 iterations in 174ms ??
-XX:TieredStopAtLevel=4: Executed 10000 iterations in 1ms
JDK 16 (AdoptOpenJDK-16.0.1+9)
-XInt                  : Executed 10000 iterations in 54ms
-XX:TieredStopAtLevel=1: Executed 10000 iterations in 160ms ??
-XX:TieredStopAtLevel=2: Executed 10000 iterations in 160ms ??
-XX:TieredStopAtLevel=3: Executed 10000 iterations in 160ms ??
-XX:TieredStopAtLevel=4: Executed 10000 iterations in 1ms
JDK 17 (Temurin-17+35)
-XInt                  : Executed 10000 iterations in 56ms
-XX:TieredStopAtLevel=1: Executed 10000 iterations in 162ms ??
-XX:TieredStopAtLevel=2: Executed 10000 iterations in 165ms ??
-XX:TieredStopAtLevel=3: Executed 10000 iterations in 177ms ??
-XX:TieredStopAtLevel=4: Executed 10000 iterations in 1ms

We did binary search on revisions, then we attribute this performance regression to JDK-8238358.

We also tested on x86_64 and aarch64. both suffer from this issue.
Profile it with async-profiler and you'll see huge amounts of time in 
 `SharedRuntime::handle_wrong_method_ic_miss` and `SharedRuntime::resolve_virtual_call_C` for `java.util.regex.Pattern$BmpCharPredicate$$Lambda$22.0x80000002b.is(int)`.

[update 11/23/2021]
The performance issue identified by the regex example(SlowStartupTest.java)  has been resolved by JDK-8276216. 
C1 still generates less efficient code for the invocation of private interface methods, manifested by InvokePrivateInterfaceMethod.java.
Comments
A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk17u-dev/pull/186 Date: 2022-03-02 16:39:25 +0000
02-03-2022

Fix Request (17u): Should get backported for parity with 17.0.4-oracle. Applies cleanly.
02-03-2022

Changeset: 21d9ca6c Author: Xin Liu <xliu@openjdk.org> Date: 2021-11-30 18:55:24 +0000 URL: https://git.openjdk.java.net/jdk/commit/21d9ca6cd942ac98a3be2577ded8eaf92dac7d46
30-11-2021

C1 and C2 have different strategies for the invocation of private interface methods. C1 can't recognize the new code pattern(invokeinterface), icvirtual call is generated. c1_LIRGenerator.cpp case Bytecodes::_invokespecial: case Bytecodes::_invokevirtual: case Bytecodes::_invokeinterface: // for loaded and final (method or class) target we still produce an inline cache, // in order to be able to call mixed mode if (x->code() == Bytecodes::_invokespecial || x->target_is_final()) { __ call_opt_virtual(target, receiver, result_register, SharedRuntime::get_resolve_opt_virtual_call_stub(), arg_list, info); } else { __ call_icvirtual(target, receiver, result_register, SharedRuntime::get_resolve_virtual_call_stub(), arg_list, info); } break; On the other hand, C2 generates checkcast code for this case. doCall.cpp if (iter().cur_bc_raw() == Bytecodes::_invokespecial && !orig_callee->is_object_initializer()) { ciInstanceKlass* calling_klass = method()->holder(); ciInstanceKlass* sender_klass = calling_klass; if (sender_klass->is_interface()) { receiver_constraint = sender_klass; } } else if (iter().cur_bc_raw() == Bytecodes::_invokeinterface && orig_callee->is_private()) { assert(holder->is_interface(), "How did we get a non-interface method here!"); receiver_constraint = holder; } if (receiver_constraint != NULL) { Node* receiver_node = stack(sp() - nargs); Node* cls_node = makecon(TypeKlassPtr::make(receiver_constraint)); Node* bad_type_ctrl = NULL; Node* casted_receiver = gen_checkcast(receiver_node, cls_node, &bad_type_ctrl);
16-11-2021

$java -XX:TieredStopAtLevel=1 -XX:+TraceCallFixup -XX:+TraceICs -XX:+PrintC1Statistics SlowStartupTest 1 the trace shows that an IC stub is repeatedly updated and handle_wrong_method_ic_miss is triggered after then. C1 Runtime statistics: _resolve_invoke_virtual_cnt: 38601 _resolve_invoke_opt_virtual_cnt: 187 _resolve_invoke_static_cnt: 47 _handle_wrong_method_cnt: 38533 _ic_miss_cnt: 37 _generic_arraycopystub_cnt: 0 _byte_arraycopy_cnt: 6996 _short_arraycopy_cnt: 0 _int_arraycopy_cnt: 0 _long_arraycopy_cnt: 0 _oop_arraycopy_cnt: 9 _arraycopy_slowcase_cnt: 0 _arraycopy_checkcast_cnt: 0 _arraycopy_checkcast_attempt_cnt:0 _new_type_array_slowcase_cnt: 2 _new_object_array_slowcase_cnt: 0 _new_instance_slowcase_cnt: 0 _new_multi_array_slowcase_cnt: 0 _monitorenter_slowcase_cnt: 0 _monitorexit_slowcase_cnt: 0 _patch_code_slowcase_cnt: 2 _throw_range_check_exception_count: 0: _throw_index_exception_count: 0: _throw_div0_exception_count: 0: _throw_null_pointer_exception_count: 0: _throw_class_cast_exception_count: 0: _throw_incompatible_class_change_error_count: 0: _throw_count: 0: 37 inline cache miss in compiled 38533 wrong method 47 unresolved static call site 38601 unresolved virtual call site 187 unresolved opt virtual call site 301 slow partial subtype 7179 byte array copies 84 short array copies 62 int array copies 55 long array copies 327 unsafe array copies
12-11-2021

For the callsite of invokeinterface, the receiver is actually is 'Field arg$1:Ljava/util/regex/Pattern$BmpCharPredicate; '. 9: invokeinterface #25, 3 // InterfaceMethod java/util/regex/Pattern$BmpCharPredicate.lambda$union$2:(Ljava/util/regex/Pattern$CharPredicate;I)Z Besides aforementioned j.u.r.Pattern$BmpCharPredicate$$Lambda$37, there are other classes which implement the interface j.u.r.Pattern$BmpCharPredicate. eg. $javap -p -c DUMP_CLASS_FILES/java/util/regex/'CharPredicates$$Lambda$29.class' final class java.util.regex.CharPredicates$$Lambda$29 implements java.util.regex.Pattern$BmpCharPredicate { ... public boolean is(int); Code: 0: iload_1 1: invokestatic #17 // Method java/util/regex/CharPredicates.lambda$ASCII_DIGIT$18:(I)Z 4: ireturn } javap -p -c DUMP_CLASS_FILES/java/util/regex/'Pattern$$Lambda$26.class' final class java.util.regex.Pattern$$Lambda$26 implements java.util.regex.Pattern$BmpCharPredicate { private final int arg$1; private final int arg$2; ... public boolean is(int); Code: 0: aload_0 1: getfield #16 // Field arg$1:I 4: aload_0 5: getfield #18 // Field arg$2:I 8: iload_1 9: invokestatic #26 // Method java/util/regex/Pattern.lambda$Range$10:(III)Z 12: ireturn } I think a CompiledIC stub doesn't work for an interface method because different oop's klasses are different.
12-11-2021

JDK-8238358 changes the call from invokespecial to invokeinterface. here is the original version. final class java.util.regex.Pattern$BmpCharPredicate$$Lambda$37 implements java.util.regex.Pattern$BmpCharPredicate { public boolean is(int); Code: 0: aload_0 1: getfield #17 // Field arg$1:Ljava/util/regex/Pattern$BmpCharPredicate; 4: aload_0 5: getfield #19 // Field arg$2:Ljava/util/regex/Pattern$CharPredicate; 8: iload_1 9: invokespecial #30 // InterfaceMethod java/util/regex/Pattern$BmpCharPredicate.lambda$union$2:(Ljava/util/regex/Pattern$CharPredicate;I)Z 12: ireturn } After hidden class patch, it changes to. 9: invokeinterface #25, 3 // InterfaceMethod java/util/regex/Pattern$BmpCharPredicate.lambda$union$2:(Ljava/util/regex/Pattern$CharPredicate;I)Z I think both of them are correct. invokeinterface makes more sense because java/util/regex/Pattern$BmpCharPredicate is indeed an interface. invokespecial doesn't need to call SharedRuntime::resolve_virtual_call_C(). it avoids this problem in the 1st place.
09-11-2021

Vladimir, please take a look.
12-10-2021

[~xliu] Yes, that's what I'm seeing too.
11-10-2021

hi, Dean, the ic stub is this one in java.util.regex.Pattern$BmpCharPredicate$$Lambda$37/0x000000080009aec0::is (15 bytes), 0x00007f5ce4fa0b6d: movabs $0xffffffffffffffff,%rax 0x00007f5ce4fa0b77: callq 0x00007f5ce4ab87a0 ; ImmutableOopMap {} ;*invokeinterface lambda$union$2 {reexecute=0 rethrow=0 return_oop=0} ; - java.util.regex.Pattern$BmpCharPredicate$$Lambda$37/0x000000080009aec0::is@9 ; {virtual_call} the callsite is 'invokeinterface', but find_callee_info() of SharedRuntime::handle_ic_miss_helper works out that call_info is a direct call. is it right? (gdb) p call_info $5 = { <StackObj> = { <AllocatedObj> = { _vptr.AllocatedObj = 0x7ffff70932a8 <vtable for CallInfo+16> }, <No data fields>}, members of CallInfo: _resolved_klass = 0x80009a748, _resolved_method = { <StackObj> = { <AllocatedObj> = { _vptr.AllocatedObj = 0x7ffff7016660 <vtable for methodHandle+16> }, <No data fields>}, members of methodHandle: _value = 0x7fff83dd5768, _thread = 0x7ffff0031000 }, _selected_method = { <StackObj> = { <AllocatedObj> = { _vptr.AllocatedObj = 0x7ffff7016660 <vtable for methodHandle+16> }, <No data fields>}, members of methodHandle: _value = 0x7fff83dd5768, _thread = 0x7ffff0031000 }, _call_kind = CallInfo::direct_call, _call_index = -2, _resolved_appendix = { _handle = 0x0 }, _resolved_method_name = { _handle = 0x0 } } (gdb) call call_info.resolved_method()->print_on(tty) {method} - this oop: 0x00007fff83dd5768 - method holder: 'java/util/regex/Pattern$BmpCharPredicate' - constants: 0x00007fff83dd5b30 constant pool [94]/operands[35] {0x00007fff83dd5b30} for 'java/util/regex/Pattern$BmpCharPredicate' cache=0x00007fff83dd6528 - access: 0x1002 private synthetic - name: 'lambda$union$2' - signature: '(Ljava/util/regex/Pattern$CharPredicate;I)Z' - max stack: 3 - max locals: 3 - size of params: 3 - method size: 13 - highest level: 1 - vtable index: -2 - i2i entry: 0x00007fffe1010de0 - adapters: AHE@0x00007ffff0388750: 0xbba00000 i2c: 0x00007fffe1057ee0 c2i: 0x00007fffe1057fca c2iUV: 0x00007fffe1057f91 c2iNCI: 0x00007fffe1058007 - compiled entry 0x00007fffe15fdc80 - code size: 26 - code start: 0x00007fff83dd5720 - code end (excl): 0x00007fff83dd573a - method data: 0x00007fff83de4140 - checked ex length: 0 - linenumber start: 0x00007fff83dd573a - localvar length: 3 - localvar start: 0x00007fff83dd5742 - compiled code: nmethod 520973 342 1 java.util.regex.Pattern$BmpCharPredicate::lambda$union$2 (26 bytes)
09-10-2021

There seems to be two problems here: 1) SharedRuntime doesn't handle "can be statically bound" virtual calls very well 2) C1 generates a "can be statically bound" virtual call when it probably shouldn't
09-10-2021

One significant difference is that the generated lambda method now uses invokeinterface instead of invokespecial. Turning on -XX:+TraceCallFixup shows a lot of messages like this: resolving virtual (invokeinterface) call to java.util.regex.Pattern$BmpCharPredicate::lambda$union$2 at pc: 0x00007f12c98052e4 to code: 0x0000000000000000 handle_wrong_method reresolving call to java.util.regex.Pattern$BmpCharPredicate::lambda$union$2 code: 0x0000000000000000 converting IC miss to reresolve (invokeinterface) call to java.util.regex.Pattern$BmpCharPredicate::lambda$union$2 from pc: 0x00007f12c98052e4 code: 0x0000000000000000
09-10-2021

The regex pattern generates 2 lambdas. java.util.regex.Pattern$BmpCharPredicate$$Lambda$37+0x000000080009aec0.is(I)Z (15 bytes) calls java.util.regex.Pattern$BmpCharPredicate::lambda$union$2 (26 bytes) however, after first resolve_virtual_call, hotspot backfills $0x80009aec0 for lambda$union$2. 0x7fffe15fd5ed: movabs $0x80009aec0,%rax 0x7fffe15fd5f7: callq 0x7fffe15fd960 This value doesn't match the klass of lambda$union$2 (0x80009ac18). as a result, ic_miss is triggered .
08-10-2021

We have seen a lot of hits on SharedRuntime::handle_wrong_method_ic_miss(JavaThread*) "Executing pns" Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xfa6f19] SharedRuntime::handle_wrong_method_ic_miss(JavaThread*)+0x15 v ~RuntimeStub::ic_miss_stub J 343 c1 java.util.regex.Pattern$BmpCharPredicate$$Lambda$37+0x000000080009aec0.is(I)Z java.base (15 bytes) @ 0x00007fffe15fd5fc [0x00007fffe15fd560+0x000000000000009c] j java.util.regex.Pattern$BmpCharPredicate.lambda$union$2(Ljava/util/regex/Pattern$CharPredicate;I)Z+2 java.base J 343 c1 java.util.regex.Pattern$BmpCharPredicate$$Lambda$37+0x000000080009aec0.is(I)Z java.base (15 bytes) @ 0x00007fffe15fd5fc [0x00007fffe15fd560+0x000000000000009c] J 350 c1 java.util.regex.Pattern$BmpCharPropertyGreedy.match(Ljava/util/regex/Matcher;ILjava/lang/CharSequence;)Z java.base (89 bytes) @ 0x00007fffe15f5654 [0x00007fffe15f55a0+0x00000000000000b4] J 346 c1 java.util.regex.Matcher.match(II)Z java.base (154 bytes) @ 0x00007fffe15f4e14 [0x00007fffe15f4c00+0x0000000000000214] J 354 c1 java.util.regex.Matcher.matches()Z java.base (10 bytes) @ 0x00007fffe15fc0fc [0x00007fffe15fc0c0+0x000000000000003c] j SlowStartupTest.main([Ljava/lang/String;)V+74 v ~StubRoutines::call_stub V [libjvm.so+0x9f1b7d] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x5dd V [libjvm.so+0xec565c] os::os_exception_wrapper(void (*)(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*), JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x32 V [libjvm.so+0x9f159c] JavaCalls::call(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x8e V [libjvm.so+0xa93ca7] jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x188 V [libjvm.so+0xaaae5f] jni_CallStaticVoidMethod+0x349 C [libjli.so+0x4e0f] JavaMain+0xbf1 C [libjli.so+0xbdb3] ThreadJavaMain+0x27
08-10-2021

ILW = MMH = P3
08-10-2021