JDK-8357514 : Disable AOT caching for runtime stubs
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 25
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2025-05-22
  • Updated: 2025-07-24
  • Resolved: 2025-05-22
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 25
25 b25Fixed
Related Reports
Causes :  
Relates :  
Relates :  
Description
After JDK-8354887 was integrated we hit strange failures which looks like memory stomps during our JCK testing of AOT new JEPs:

# Internal Error (/workspace/open/src/hotspot/share/opto/regmask.hpp:222), pid=4186624, tid=4186658
# assert(_RM_UP[i] == 0) failed: _hwm too low: 5 regs at: 4

or

# Internal Error (/workspace/open/src/hotspot/share/opto/type.cpp:996), pid=2832821, tid=2832868
# fatal error: meet not symmetric

or other strange issues during C2 compilation

After investigating (running tests in loop) I narrowed done the issue to AOT caching of C2 runtime stubs:

https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/output.cpp#L3491

After internal discussion we decided disable all runtime stubs caching.
There is no guarantee that we may not have issues with C1 stubs too.

I propose hard code AOTStubCaching flag to `false` value until the issue is solved.
Comments
Verified there are no failures in JDK 25 ATR
24-07-2025

Also needs fixing in method MacroAssembler::codestub_branch_needs_far_jump() See this Leyden commit https://github.com/openjdk/leyden/commit/b949235f5d05e51ddde79e51585f07fe1ba66d9d
22-05-2025

No guarantee that this is the only problem but I noticed an omission at macroAssembler_aarch64.cpp:861 in method MacroAssembler::is_always_within_branch_range(Address) The jdk master code looks like this: // Check the entry target is always reachable from any branch. static bool is_always_within_branch_range(Address entry) { const address target = entry.target(); if (!CodeCache::contains(target)) { // We always use trampolines for callees outside CodeCache. assert(entry.rspec().type() == relocInfo::runtime_call_type, "non-runtime call of an external target"); return false; } . . . The premain code includes an extra check: // Check the entry target is always reachable from any branch. static bool is_always_within_branch_range(Address entry) { if (SCCache::is_on_for_write()) { return false; } const address target = entry.target(); if (!CodeCache::contains(target)) { // We always use trampolines for callees outside CodeCache. assert(entry.rspec().type() == relocInfo::runtime_call_type, "non-runtime call of an external target"); return false; } . . . So, we may well be planting a PC-relative branch when we ought to be using an (runtime) relocatable absolute load and branch
22-05-2025

FTR. Ashutosh pointed me that we already have this relocation in mainline after JDK-8354887 changes. Something else is causing the issue then.
22-05-2025

[~asmehra] Okay but we need to fix them too even if you did not observe failures. And regardless the current issue. About current issue. As I sent in e-mail to you it could be missing relocation for runtime calls in .ad files: enc_class Java_To_Runtime(method meth) %{ // No relocation needed if (AOTCodeCache::is_on_for_dump()) { // Created runtime_call_type relocation when caching code __ lea(r10, RuntimeAddress((address)$meth$$method)); } else { __ mov64(r10, (int64_t) $meth$$method); } __ call(r10); __ post_call_nop(); %} We have it in leyden/premain branch. I will test this.
22-05-2025

[~kvn] I actually looked at opto/generateOptoStub.cpp for a reference to EncodePKlass or DecodeNKlass nodes, but nothing popped out immediately. When I worked on https://github.com/openjdk/leyden/pull/68 I came across crashes in C2 compiled code due to missing relocations in encode_and_move_klass_not_null and decode_and_move_klass_not_null which is why I added them in the premain branch. But I never came across such failures in the mainline during testing with the patch for caching C2 runtime stubs.
22-05-2025

[~asmehra] How you verified that? C2 uses .ad files to generate its stubs: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/compile.cpp#L1018 And those macro assembler methods are called from .ad file instructions.
22-05-2025

> C2 runtime stubs are generated by C2 and it uses .ad files to generate code. But we did not updates (added relocations) for class encoding/decoding for macro assembler instructions referenced only from .ad file. Are you referring to MacroAssembler::encode_and_move_klass_not_null and MacroAssembler::decode_and_move_klass_not_null? I was under the impression that these routines are not used by C2 stubs. I mentioned this in my comment here - https://github.com/openjdk/leyden/pull/68#issuecomment-2895100848.
22-05-2025

C2 runtime stubs are generated by C2 and it uses .ad files to generate code. But we did not updates (added relocations) for class encoding/decoding for macro assembler instructions referenced only from .ad file. It could be the cause of failure.
22-05-2025

[~kvn] I don't have access to JDK-8357398. Is there a way I can reproducer memory stomping issue locally?
22-05-2025

ILW = Various crashes with AOT caching, medium?, -XX:-AOTStubCaching = HMM = P2
22-05-2025

Changeset: 8184ce39 Branch: master Author: Vladimir Kozlov <kvn@openjdk.org> Date: 2025-05-22 06:09:34 +0000 URL: https://git.openjdk.org/jdk/commit/8184ce39a8a732352ee841fed09cae905d27643c
22-05-2025

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/25379 Date: 2025-05-22 03:46:39 +0000
22-05-2025