JDK-8372284 : Deduplicate C2 stubs
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 26
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2025-11-20
  • Updated: 2025-11-21
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Description
I have long suspected that C2Stubs take a significant part of the code cache. Most of them look different "only" because there are Label-s that bind their return addresses. The meat of these stubs are roughly the same. And since these stubs often call runtime, the bulk of their meat is stack pushes/pops.

This is not exactly the problem for the hot code paths: these stubs are out-lined for a reason. But it becomes a problem when code density is needed anyway, at least for two cases: a) the interprocedural calls like to call in near memory (at least on some AArch64 cores); b) AOTCache would eventually store generated code, and archive size would impact startup/warmup.

To estimate how bad it is, I whipped up a simple patch:
  https://github.com/openjdk/jdk/compare/master...shipilev:wip-count-nmethod-size

Here is the result for SpringBoot PetClinic running at x86_64:
 Parallel:   5670480 in instructions,       0 in GC stubs,  96094 in C2 stubs ( 1.7% are stubs)
 G1:         6437416 in instructions,  623779 in GC stubs, 101031 in C2 stubs (10.1% are stubs)
 Shenandoah: 7466346 in instructions,       0 in GC stubs,  93806 in C2 stubs ( 1.2% are stubs)
 ZGC:        5530769 in instructions, 2018482 in GC stubs,  92234 in C2 stubs (27.6% are stubs)

Looks like late barrier expansion in G1 and ZGC contribute most of the stub code. Shenandoah, when it will implement late barrier expansion, would get into the same condition.

I wonder if we can make it a bit better by de-duplicating the stubs, and maybe _calling_ them with interior near calls (like we do trampolines), so that we "record" the return address using machine state itself? Remains to be seen if it is possible without affecting fast-path performance much. Current stub code is also smartly push/pop-ing only the part of the registers that are actually in use at the point where the stub is used -- that might tip the scale significantly.

Example tail of nmethod with G1:

; C2Stub 1
  0x00007f40e42e730f:   mov    (%r10),%r11d
  0x00007f40e42e7312:   shl    $0x3,%r11
  0x00007f40e42e7316:   cmp    $0x0,%r11
  0x00007f40e42e731a:   je     0x00007f40e42e7183
  0x00007f40e42e7320:   mov    0x38(%r15),%rdi
  0x00007f40e42e7324:   test   %rdi,%rdi
  0x00007f40e42e7327:   je     0x00007f40e42e7341
  0x00007f40e42e732d:   sub    $0x8,%rdi
  0x00007f40e42e7331:   mov    %rdi,0x38(%r15)
  0x00007f40e42e7335:   add    0x40(%r15),%rdi
  0x00007f40e42e7339:   mov    %r11,(%rdi)
  0x00007f40e42e733c:   jmp    0x00007f40e42e7183
  0x00007f40e42e7341:   sub    $0x40,%rsp
  0x00007f40e42e7345:   mov    %r10,0x38(%rsp)
  0x00007f40e42e734a:   mov    %r8,0x30(%rsp)
  0x00007f40e42e734f:   mov    %r9,0x28(%rsp)
  0x00007f40e42e7354:   mov    %rcx,0x20(%rsp)
  0x00007f40e42e7359:   mov    %rdx,0x18(%rsp)
  0x00007f40e42e735e:   mov    %rsi,0x10(%rsp)
  0x00007f40e42e7363:   mov    %rax,0x8(%rsp)
  0x00007f40e42e7368:   mov    %r11,%rdi
  0x00007f40e42e736b:   mov    %r15,%rsi
  0x00007f40e42e736e:   call   0x00007f40f616ea70           ;   {runtime_call G1BarrierSetRuntime::write_ref_field_pre_entry(oopDesc*, JavaThread*)}
  0x00007f40e42e7373:   mov    0x8(%rsp),%rax
  0x00007f40e42e7378:   mov    0x10(%rsp),%rsi
  0x00007f40e42e737d:   mov    0x18(%rsp),%rdx
  0x00007f40e42e7382:   mov    0x20(%rsp),%rcx
  0x00007f40e42e7387:   mov    0x28(%rsp),%r9
  0x00007f40e42e738c:   mov    0x30(%rsp),%r8
  0x00007f40e42e7391:   mov    0x38(%rsp),%r10
  0x00007f40e42e7396:   vzeroupper
  0x00007f40e42e7399:   add    $0x40,%rsp
  0x00007f40e42e739d:   jmp    0x00007f40e42e7183

; C2Stub 2
  0x00007f40e42e73a2:   mov    (%r10),%r11d
  0x00007f40e42e73a5:   shl    $0x3,%r11
  0x00007f40e42e73a9:   cmp    $0x0,%r11
  0x00007f40e42e73ad:   je     0x00007f40e42e7196
  0x00007f40e42e73b3:   mov    0x38(%r15),%rdi
  0x00007f40e42e73b7:   test   %rdi,%rdi
  0x00007f40e42e73ba:   je     0x00007f40e42e73d4
  0x00007f40e42e73c0:   sub    $0x8,%rdi
  0x00007f40e42e73c4:   mov    %rdi,0x38(%r15)
  0x00007f40e42e73c8:   add    0x40(%r15),%rdi
  0x00007f40e42e73cc:   mov    %r11,(%rdi)
  0x00007f40e42e73cf:   jmp    0x00007f40e42e7196
  0x00007f40e42e73d4:   sub    $0x40,%rsp
  0x00007f40e42e73d8:   mov    %r10,0x38(%rsp)
  0x00007f40e42e73dd:   mov    %r8,0x30(%rsp)
  0x00007f40e42e73e2:   mov    %r9,0x28(%rsp)
  0x00007f40e42e73e7:   mov    %rcx,0x20(%rsp)
  0x00007f40e42e73ec:   mov    %rdx,0x18(%rsp)
  0x00007f40e42e73f1:   mov    %rsi,0x10(%rsp)
  0x00007f40e42e73f6:   mov    %rax,0x8(%rsp)
  0x00007f40e42e73fb:   mov    %r11,%rdi
  0x00007f40e42e73fe:   mov    %r15,%rsi
  0x00007f40e42e7401:   call   0x00007f40f616ea70           ;   {runtime_call G1BarrierSetRuntime::write_ref_field_pre_entry(oopDesc*, JavaThread*)}
  0x00007f40e42e7406:   mov    0x8(%rsp),%rax
  0x00007f40e42e740b:   mov    0x10(%rsp),%rsi
  0x00007f40e42e7410:   mov    0x18(%rsp),%rdx
  0x00007f40e42e7415:   mov    0x20(%rsp),%rcx
  0x00007f40e42e741a:   mov    0x28(%rsp),%r9
  0x00007f40e42e741f:   mov    0x30(%rsp),%r8
  0x00007f40e42e7424:   mov    0x38(%rsp),%r10
  0x00007f40e42e7429:   vzeroupper
  0x00007f40e42e742c:   add    $0x40,%rsp
  0x00007f40e42e7430:   jmp    0x00007f40e42e7196

Comments
"maybe _calling_ them with interior near calls (like we do trampolines)". Yes.
21-11-2025

I rise priority for this RFE because it is important for Leyden and AArch64 as Aleksey pointed.
21-11-2025

There are few easy opportunities in the stub code itself, BTW: JDK-8372285.
20-11-2025