JDK-8294002 : Alignment of static call stubs is inconsistent
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 11,17,19,20
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • CPU: x86
  • Submitted: 2022-09-19
  • Updated: 2022-11-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Relates :  
Description
When we emit static call stubs via C1, we carefully align the instructions of the static call stub in the following way:

  __ align(BytesPerWord, __ offset() + NativeMovConstReg::instruction_size + NativeCall::displacement_offset);
  __ relocate(static_stub_Relocation::spec(call_pc));
  __ mov_metadata(rbx, (Metadata*)NULL);
  // must be set to -1 at code generation time
  assert(((__ offset() + 1) % BytesPerWord) == 0, "must be aligned");
  // On 64bit this will die since it will take a movq & jmp, must be only a jmp
  __ jump(RuntimeAddress(__ pc()));

However, when we emit the static call stubs from C2, we go through a different path in CompiledStaticCall::emit_to_interp_stub, where the instructions are not explicitly aligned in the same way.

It would appear that the C2 static call stubs are not aligned appropriately, which might lead to strange behaviour when it is concurrently updated with cross modifying code, partially during resolution, and partially through concurrent class unloading.
Comments
I think I see the problem. When the interpreter or C1 patching resolves a call site, it stores the result in the ConstantPoolCacheEntry, but C2 and probably non-patching C1 call sites don't seem to do that. So if the call site ever gets set to clean, the next resolve could get a different result. This seems bad or at least undesirable, but straight-forward to fix. Then there are the separate issues related to the lifecycle of MethodTypeForm cache objects: 1) when is it good to release the SoftReference LambdaForm and MethodHandles in the MethodTypeForm caches, 2) if a Method* reference in the ConstantPoolCacheEntry should keep them alive, and 3) do they even need to be SoftReferences at all. We should be able to investigate those later if necessary.
01-10-2022

[~mdoerr] Do you recall in what context the compiled lambda form was updated? I don't quite see where that would come from.
30-09-2022

It's a bit unfortunate, that allowing "callee->is_compiled_lambda_form()" in the assertion was part of the s390 contribution. It's not at all platform specific. We had added this case because we were hitting the assertion and identified this case to be safe back in 2015.
30-09-2022

Sorry, no. I'll check with Lutz and Martin.
30-09-2022

The callee->is_compiled_lambda_form() exception in verify_mt_safe() seems to have come from: 8167673: [s390] The s390 port but I don't understand what the original issue was. I'm hoping it no longer applies. [~goetz], do you happen to remember?
30-09-2022

So is it fair to say that if we injected the right oops in the oop section, the only source of *concurrent* modification, would potentially be concurrent switching of lambda forms? If so, I do wonder if anyone would cry, if we just deoptimized the caller instead. Although I'm not actually sure why you would have to change the lambda form in the first place, other than it dying (which the oop trick solves).
30-09-2022

So for a "simple" static call, we can potentially have references to another method/nmethod in these places: 1. call site instruction target immediate 2. call site instruction target metadata in nmethod metadata section [not sure if any platform uses this] 3. call site instruction target oop in nmethod oop section [NEW] 4. static stub Method* move instruction immediate [x64, AArch64] 5. static stub Method* move instruction value in nmethod metadata section [AArch64] 6. static stub Method* move instruction oop in nmethod oop section [NEW] 7. static stub call instruction immediate [c2i adapter] and none of these have stable values. I'm a little worried what bad things can happen if we can't keep all of these consistent, and disappointed that not even the static stub part has stable values. I would setting for "unstable value, but not requiring atomic updates", but I'm having trouble convincing myself that even that part is true.
30-09-2022

What I'm worried about is modifying the stub instructions safely while they are executing. Even if we could do that on x86 with proper alignment, we can't currently do that on AArch64 because it uses 3 instructions to set the metadata: https://mail.openjdk.org/pipermail/hotspot-dev/2019-March/037454.html I think code that changes the stub values makes sure that is not active by checking that the corresponding call site is clean, but I think assumes: 1. that there is a safepoint between setting the call site to clean and modifying the stub, to catch any laggers that might still be executing in the stub 2. that there is a 1:1 correspondence between call sites and stubs, which is no longer true after JDK-8280481 For the [1] lambda form issue, I can reset the value in the oop section to the new target's klass_holder(), but I don't think that solves the potentially unsafe stub modification issue. Another interesting thing about Aarch64 is that changing metadata_reloc->metadata_addr()[0] doesn't change the metadata in the move instructions, even if we call fix_metadata_relocation(), because pd_fix_value() does nothing on aarch64. So that means cleanup_inline_caches_impl() currently leaves potentially stale metadata in the stub, and it isn't detected by nmethod::metadata_do() because aarch64 doesn't set reloc->metadata_is_immediate() for the instructions, even though technically the metadata is immediate.
29-09-2022

Great. I already convinced myself that placing the new oops in the nmethod was the right approach, but I was missing the register_nmethod magic. Unfortunately, as revealed by the assert in verify_mt_safe, we still allow 3 different ways to reset the metadata in the static stub to a new value: [1] callee->is_compiled_lambda_form() || [2] !old_method->method_holder()->is_loader_alive() || [3] old_method->is_old(), // may be race patching deoptimized nmethod due to redefinition. I believe my proposal could remove [2], but not [1] and [3].
29-09-2022

I like the idea. I also wished the caller would become is_unloading. I don’t think it would break other invalidation paths. The trickiest is class redefinition, but I think it should be fine. Having said that, throwing in oops that are not in the nmethod and instead in OopStorage will break some GCs, as they asusme the oops are in the nmethod. But we could do something with the same effect but with all oops being in the nmethod. All GCs support register_nmethod being called again, due to C1 code patching. There when a mirror isn’t known, an entry is kept as NULL in the oop section, and is filled out lazily, and then register_nmethod is called again to register the new oop with the GC. I think that would have the effect that we want.
29-09-2022

Ah. Doesn't [3] run STW though, so there isn't really a race? At least I thought so. As for [1] it sounds like we want the relevant callee oop to be in the oop section.
29-09-2022

[~eosterlund], I was thinking it would be nice to get rid of the need for cross-modifying code here. If we can make the metadata "stable", so that once set it stays set, and have the dead oops logic check for oops in static stubs. Right now we don't always have nmethod oops for metadata in static stubs, so it won't be found by oops_do(), but I believe it will be found by metadata_do(). If can use the "dead oop" logic to make an nmethod with a static call stub that references dead oops become is_unloading, then that seems like a win. If 8222841 is the only reason for needed c2i barriers, then maybe they would no longer be needed. The part I'm less sure about is if the change I propose would break other nmethod invalidation paths, including class redefinition, unloading "cold" nmethod, nmethod dependencies, and deoptimization.
28-09-2022

The fact that C1 aligns the call but not the move seems strange. As far as I can tell, it is the move that needs to be updated in an MT-safe way, not the call.
26-09-2022

ILW = potential issue for cross-modifying code; intermittent; no workaround = MMH = P3
19-09-2022