JDK-8026297 : Generate AdapterHandlerEntry during CDS dump
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 15
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2013-10-10
  • Updated: 2023-01-03
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Description
We should generate the AdapterHandlerEntry's during CDS dump time (e.g., Method::make_adapters()). The goal is to avoid changing the contents of Method as much as possible. Currently in CDS, Method::_adapter is updated as soon as the class is loaded. This causes a lot of Methods to become dirty and reduces sharing.

I looked at the i2c and c2i code on x64, and it seems doable because:

[1] They are mostly position independent. There are a few calls to VM functions.
    These functions are in libjvm.so, so their locations may change. But I can fix this
    by jumping through a trampoline, since these calls are not on hot paths anyway.

[2] The i2c and c2i functions are NOT patched during run time.

[3] The code is independent of
    [a] choice of GC
    [b] Client vs Server compiler
    [c] mixed mode vs -Xcomp

/* example of i2c/c2i code on x64 */

i2c_entry:
   0x7fffe10d72e0:    mov    (%rsp),%rax
   ...[debug code skipped]...
   0x7fffe10d73a7:    mov    %rsp,%r11
   0x7fffe10d73aa:    and    $0xfffffffffffffff0,%rsp
   0x7fffe10d73ae:    push   %rax
   0x7fffe10d73af:    mov    %r11,%rax
   0x7fffe10d73b2:    mov    0x48(%rbx),%r11
   0x7fffe10d73b6:    mov    0x8(%rax),%rsi
   0x7fffe10d73ba:    mov    %rbx,0x258(%r15)
   0x7fffe10d73c1:    mov    %rbx,%rax
   0x7fffe10d73c4:    jmpq   *%r11

c2i_unverified_entry:
   0x7fffe10d73c7:    mov    0x8(%rsi),%ebx
   0x7fffe10d73ca:    movabs $0x800000000,%r12  // compressed klass pointer(OK)
   0x7fffe10d73d4:    add    %r12,%rbx
   0x7fffe10d73d7:    xor    %r12,%r12
   0x7fffe10d73da:    cmp    0x10(%rax),%rbx
   0x7fffe10d73de:    mov    0x8(%rax),%rbx
   0x7fffe10d73e2:    je     0x7fffe10d73ed
   0x7fffe10d73e8:    jmpq   0x7fffe10d59e0     // FIX with trampoline: jump(RuntimeAddress(SharedRuntime::get_ic_miss_stub()));
   0x7fffe10d73ed:    cmpq   $0x0,0x50(%rbx)
   0x7fffe10d73f5:    je     0x7fffe10d74f0
   0x7fffe10d73fb:    jmpq   0x7fffe10d59e0     // FIX with trampoline: jump(RuntimeAddress(SharedRuntime::get_ic_miss_stub()));

c2i_entry:
   0x7fffe10d7400:    cmpq   $0x0,0x50(%rbx)
   0x7fffe10d7408:    je     0x7fffe10d74f0
   0x7fffe10d740e:    mov    %rsp,%r13
   0x7fffe10d7411:    mov    (%rsp),%rax
   0x7fffe10d7415:    and    $0xfffffffffffffff0,%rsp
   0x7fffe10d7419:    pushfq
   0x7fffe10d741a:    sub    $0x8,%rsp
   0x7fffe10d741e:    mov    %rsp,-0x28(%rsp)
   0x7fffe10d7423:    sub    $0x80,%rsp
   0x7fffe10d742a:    mov    %rax,0x78(%rsp)
   0x7fffe10d742f:    mov    %rcx,0x70(%rsp)
   0x7fffe10d7434:    mov    %rdx,0x68(%rsp)
   0x7fffe10d7439:    mov    %rbx,0x60(%rsp)
   0x7fffe10d743e:    mov    %rbp,0x50(%rsp)
   0x7fffe10d7443:    mov    %rsi,0x48(%rsp)
   0x7fffe10d7448:    mov    %rdi,0x40(%rsp)
   0x7fffe10d744d:    mov    %r8,0x38(%rsp)
   0x7fffe10d7452:    mov    %r9,0x30(%rsp)
   0x7fffe10d7457:    mov    %r10,0x28(%rsp)
   0x7fffe10d745c:    mov    %r11,0x20(%rsp)
   0x7fffe10d7461:    mov    %r12,0x18(%rsp)
   0x7fffe10d7466:    mov    %r13,0x10(%rsp)
   0x7fffe10d746b:    mov    %r14,0x8(%rsp)
   0x7fffe10d7470:    mov    %r15,(%rsp)
   0x7fffe10d7474:    sub    $0x200,%rsp
   0x7fffe10d747b:    fxsave64 (%rsp)
   0x7fffe10d7480:    mov    %rbx,%rdi
   0x7fffe10d7483:    mov    %rax,%rsi
   0x7fffe10d7486:    callq  0x7ffff72dd2c6 <SharedRuntime::fixup_callers_callsite(Method*, address)>// Fix with trampoline
   0x7fffe10d748b:    fxrstor64 (%rsp)
   0x7fffe10d7490:    add    $0x200,%rsp
   0x7fffe10d7497:    mov    (%rsp),%r15
   0x7fffe10d749b:    mov    0x8(%rsp),%r14
   0x7fffe10d74a0:    mov    0x10(%rsp),%r13
   0x7fffe10d74a5:    mov    0x18(%rsp),%r12
   0x7fffe10d74aa:    mov    0x20(%rsp),%r11
   0x7fffe10d74af:    mov    0x28(%rsp),%r10
   0x7fffe10d74b4:    mov    0x30(%rsp),%r9
   0x7fffe10d74b9:    mov    0x38(%rsp),%r8
   0x7fffe10d74be:    mov    0x40(%rsp),%rdi
   0x7fffe10d74c3:    mov    0x48(%rsp),%rsi
   0x7fffe10d74c8:    mov    0x50(%rsp),%rbp
   0x7fffe10d74cd:    mov    0x60(%rsp),%rbx
   0x7fffe10d74d2:    mov    0x68(%rsp),%rdx
   0x7fffe10d74d7:    mov    0x70(%rsp),%rcx
   0x7fffe10d74dc:    mov    0x78(%rsp),%rax
   0x7fffe10d74e1:    add    $0x80,%rsp
   0x7fffe10d74e8:    add    $0x8,%rsp
   0x7fffe10d74ec:    popfq
   0x7fffe10d74ed:    mov    %r13,%rsp
   0x7fffe10d74f0:    pop    %rax
   0x7fffe10d74f1:    mov    %rsp,%r13
   0x7fffe10d74f4:    sub    $0x10,%rsp
   0x7fffe10d74f8:    mov    %rax,(%rsp)
   0x7fffe10d74fc:    mov    %rsi,0x8(%rsp)
   0x7fffe10d7501:    mov    0x38(%rbx),%rcx
   0x7fffe10d7505:    jmpq   *%rcx
   0x7fffe10d7507:    int3

Comments
I implemented a quick prototype on Linux/x64. It works in -Xint mode only. When running with "perf stat -r 100 java -Xint -XX:-UsePerfData -cp . HelloWorld": http://cr.openjdk.java.net/~iklam/jdk15/8026297_adapter_in_archive.proto.v01/ Before: 41.647ms After: 39.254ms (41.647 - 39.254) / 41.647 = 5.7%
02-01-2020

Reopening this RFE. Generating the AdapterHandlerEntry can cost more than 3% of HelloWorld start-up time. I estimated the benefits by putting a loop here: http://hg.openjdk.java.net/jdk/jdk/file/2fbc66ef1a1d/src/hotspot/share/runtime/sharedRuntime.cpp#l2712 ++ for (int i=0; i<2; i++) { MacroAssembler _masm(&buffer); entry = SharedRuntime::generate_i2c2i_adapters(....); ++ } Before instructions 118386693 After instructions 120935012 = +2.15% Before elapsed time 43.211ms After elapsed time 44.520ms = +3.03%, or +1.309ms. ------------ The raw data in perf_data_masm_twice.txt is gathered using this: JVM0=/jdk/bld/old/bin/java JVM1=/jdk/bld/new/bin/java for i in 1 2 3 4 5 6 7 8 9 10; do perf stat -r 50 $JVM0 -Xshare:on -XX:-UsePerfData -cp ~/tmp HelloWorld > /dev/null perf stat -r 50 $JVM1 -Xshare:on -XX:-UsePerfData -cp ~/tmp HelloWorld > /dev/null done
21-12-2019

The concern of this RFE (avoid writing to archived Methods during class link time) is already addressed in JDK-8145221 (Use trampolines for i2i and i2c entries in Methods that are stored in CDS archive).
16-02-2017

This is unrelated to AOT. This bug is for changing the contents of the CDS archive. As far as I know, AOT has no plan to do that.
08-04-2015

Jumping through trampoline may cause unnecessary slow down. Also, the contents of the entry routines *might* be dependent on run time flags (e.g., someone may try to add a new debug flag in the future to generate extra code, or we may need extra code to support profilers, etc). It's best to reserve space for these routines at dump time, so that Method::{i2i, rom_compiled_entry, from_interpreted_entry} can be fixed at dump time. However, we generate the actual code at run time to ensure compatibility. If the reserved space is too small, we can revert to the original behavior -- allocate the entry from the 'normal' locations, and modify Method::{i2i, rom_compiled_entry, from_interpreted_entry} to point to them accordingly.
21-02-2014

Traces show that when running HelloWorld on Linux/amd64, 60328 bytes of x64 code were generated, for 155 AdapterHandlerEntry's. diff -r ced68a57cdbd src/cpu/x86/vm/sharedRuntime_x86_64.cpp --- a/src/cpu/x86/vm/sharedRuntime_x86_64.cpp Tue Oct 08 11:37:54 2013 +0200 +++ b/src/cpu/x86/vm/sharedRuntime_x86_64.cpp Mon Oct 14 14:11:31 2013 -0700 @@ -883,6 +883,14 @@ gen_c2i_adapter(masm, total_args_passed, comp_args_on_stack, sig_bt, regs, skip_fixup); + { + static int total = 0; + static int count = 0; + count ++; + total += int((__ pc()) - i2c_entry); + tty->print_cr("[%d] = %d bytes", count, total); + } + __ flush(); return AdapterHandlerLibrary::new_entry(fingerprint, i2c_entry, c2i_entry, c2i_unverified_entry); }
14-10-2013