JDK-8340812 : LambdaForm customization via MethodHandle::updateForm is not thread safe
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.lang.invoke
  • Affected Version: 8,11,17,21,24
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2024-09-24
  • Updated: 2024-11-05
  • Resolved: 2024-09-26
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 17 JDK 21 JDK 23 JDK 24
11.0.26-oracleFixed 17.0.14-oracleFixed 21.0.6-oracleFixed 23.0.2Fixed 24 b17Fixed
Related Reports
Relates :  
Description
This issue originally manifested as intermittent NPEs in invokeBasic with an Oracle internal test:

Stack: [0x0000ffff37606000,0x0000ffff37804000], sp=0x0000ffff37802330, free space=2032k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x677c28] Exceptions::debug_check_abort(char const*, char const*)+0xc8 (exceptions.cpp:569)
V [libjvm.so+0x677e80] Exceptions::debug_check_abort_helper(Handle, char const*)+0x9c (exceptions.cpp:591)
V [libjvm.so+0x678310] Exceptions::_throw(JavaThread*, char const*, int, Handle, char const*)+0xb0 (exceptions.cpp:579)
V [libjvm.so+0xc42d04] SharedRuntime::throw_NullPointerException_at_call(JavaThread*)+0x44 (sharedRuntime.cpp:853)
v ~RuntimeStub::NullPointerException at call throw_exception 0x0000ffff90096eb4
J 112 c2 MemAccessStressModule.test(Ljava/lang/invoke/VarHandle;Ljava/lang/Object;I)V (23 bytes) @ 0x0000ffff901164c8 [0x0000ffff90116400+0x00000000000000c8]
J 1098 c2 java.lang.Thread.run()V java.base@24-ea (23 bytes) @ 0x0000ffff90258514 [0x0000ffff90258440+0x00000000000000d4]
v ~StubRoutines::call_stub 0x0000ffff9005a114


  0x0000ffff901164a4: ; implicit exception: dispatches to 0x0000ffff90116fb0
  0x0000ffff901164a4: 4b1d 40b9 | 8b07 0034 | f403 00f9 | ec0b 00f9 | 64fe 40d3 | e20f 40a9 | e10b 40f9 | e503 04aa
  0x0000ffff901164c4: ; ImmutableOopMap {[0]=Oop [8]=Oop }
                      ;*invokevirtual invokeBasic {reexecute=0 rethrow=0 return_oop=0}
                      ; - java.lang.invoke.LambdaForm$VH/0x0000000201006000::invoke_MT@34
                      ; - MemAccessStressModule::test@8 (line 18)
                      ; {optimized virtual_call}
  0x0000ffff901164c4: 6fb9 ff97

  0x0000ffff901164c8: ; {post_call_nop}
  0x0000ffff901164c8: 1f20 03d5 | 1f68 80f2 | 1f00 80f2 | f403 40f9

Event: 14.595 Thread 0x0000ffff3c238600 NullPointerException in MH adapter 0x0000ffff90104a8c
Event: 14.595 Thread 0x0000ffff3c1cd5d0 NullPointerException in MH adapter 0x0000ffff90104a8c
Event: 14.595 Thread 0x0000ffff3c21efd0 NullPointerException in MH adapter 0x0000ffff90104a8c

I'm starting to suspect that the VarHandle implementation is not thread safe on AArch64 and as a result some internal data structure is not completely initialized. -XX:+VerifyMethodHandles does not reveal anything though.

I disabled the exception handler in SharedRuntime::continuation_for_implicit_exception to enforce an early crash and as expected we SIGSEGV here:

Stack: [0x0000ffff34ec0000,0x0000ffff350be000], sp=0x0000ffff350bc2f0, free space=2032k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J 54 java.lang.invoke.MethodHandle.invokeBasic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)V java.base@24-internal (0 bytes) @ 0x0000ffffa0bbb88c [0x0000ffffa0bbb880+0x000000000000000c]

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000024

0x0000000000000000: 1F 20 03 D5 nop
0x0000000000000004: 2C 14 40 B9 ldr w12, [x1, #0x14]
0x0000000000000008: 8C 29 40 B9 ldr w12, [x12, #0x28]
0x000000000000000c: 8C 25 40 B9 ldr w12, [x12, #0x24] <--- We crash here
0x0000000000000010: 8C 09 40 F9 ldr x12, [x12, #0x10]
0x0000000000000014: 6C 00 00 B4 cbz x12, #0x20
0x0000000000000018: 88 21 40 F9 ldr x8, [x12, #0x40]

R1 =0x000000008760f930 is an oop: java.lang.invoke.BoundMethodHandle$Species_LL
{0x000000008760f930} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' - flags:

 - ---- fields (total size 5 words):
 - private 'customizationCount' 'B' @12 127 (0x7f)
 - private volatile 'updateInProgress' 'Z' @13 false (0x00)
 - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x0000000086e02940} = (Ljava/lang/invoke/VarHandle;Ljava/lang/Object;Ljava/lang/String;Ljava/lang/String;)V (0x86e02940)
 - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000008760f900} => a 'java/lang/invoke/MemberName'{0x00000000a5323fd0} = {method} {0x0000ffff70905f18} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)V' in 'java/lang/invoke/LambdaForm$MH+0x0000ffff70906068' (0x8760f900)
 - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 null (0x00000000)
 - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 null (0x00000000)
 - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000008768f698} (0x8768f698)
 - final 'argL1' 'Ljava/lang/Object;' @36 a 'java/lang/invoke/BoundMethodHandle$Species_LL'{0x000000008768f6c0} (0x8768f6c0)

R12=0x0 is null

So we read the 'form' field from a 'BoundMethodHandle$Species_LL' and then another field from that 'LambdaForm' object at offset 0x28 which is unexpectedly null.

This is code from MethodHandles::jump_to_lambda_form and the second read is from the java_lang_invoke_LambdaForm::vmentry_offset(). I verified this by adding a null check and it triggers.

For some reason LambdaForm::vmentry is null. Maybe a race condition during initialization due to the weak memory model on AArch64.

The vmentry field is not null anymore at the time the hs_err file is generated:
"a 'java/lang/invoke/MemberName'{0x00000000a5323fd0} = {method} {0x0000ffff70905f18} 'invoke' "

Which suggests that this is indeed a race condition.

After digging around in the MethodHandle implementation, I found this suspicious code in MethodHandle::updateForm:

newForm.prepare(); // as in MethodHandle.<init>
UNSAFE.putReference(this, FORM_OFFSET, newForm);
UNSAFE.fullFence();

https://github.com/openjdk/jdk/blob/36314a90c15e2ab2a9b32c2e471655c1b07d452c/src/java.base/share/classes/java/lang/invoke/MethodHandle.java#L1883

The LambdaForm 'newForm' can have 'vmentry`set to null and the field is then set in 'newForm.prepare()'. The LambdaForm is then published by the 'UNSAFE.putReference' store but there is nothing that prevents another thread from observing a not fully initialized object. I.e., another thread can observe 'vmentry == null'. Maybe the 'UNSAFE.fullFence()' was supposed to fix this but then it's at the wrong place. A 'UNSAFE.storeStoreFence()' before publishing should be sufficient. Running more tests to confirm.
Comments
Fix request [17u,21u] I backport this for parity with 17.0.14-oracle,21.0.6-oracle. Medium risk, changes to synchronization are always risky. Clean backport to 21, resolved Copyright in 17, probably clean anyways. Test passes, but also without the fix. Test times out with slowdebug. SAP nightly testing passed.
05-11-2024

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk17u-dev/pull/3024 Date: 2024-11-04 08:55:40 +0000
04-11-2024

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk21u-dev/pull/1119 Date: 2024-11-04 08:52:44 +0000
04-11-2024

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk23u/pull/149 Date: 2024-10-10 05:00:43 +0000
10-10-2024

[jdk23u-fix-request] Already backported to Oracle JDK 11u, 17u and 21u. Low risk, applies cleanly. Tier1-3 pass.
10-10-2024

Changeset: 47c10694 Branch: master Author: Tobias Hartmann <thartmann@openjdk.org> Date: 2024-09-26 06:03:29 +0000 URL: https://git.openjdk.org/jdk/commit/47c10694c66bc131c8a5e1572340415b8daaba08
26-09-2024

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/21160 Date: 2024-09-24 14:18:58 +0000
25-09-2024

Good analysis Jorn. Background: Originally, LFs supported mixed-mode execution, starting as interpreted and eventually warming up and compiling. This was nice for one reason: The interpreter (a single page of code) provided a clear formal semantics for LFs. The dynamics of the mixed mode execution never panned out as a useful thing. But MHs also support mixed-mode execution, and this does seem to have useful dynamics. A MH can be patched to upgrade to a specialized form. This seems to be necessary for performance of MHs which are non-constant. (Constant MHs get open-coded by the JIT regardless of how intensely their LF is optimized.) I suggest putting release fences in both places where these sensitive behavioral links are patched: Both after vmentry is set (and maybe before also), and after MH::form is reset (if/as we do this tricky bit).
24-09-2024

Hi Jorn, thanks for your quick feedback and background! Please see my draft PR for some more details: https://github.com/openjdk/jdk/pull/21160
24-09-2024

Hey Tobias, nice find! Your analysis looks correct to me. I think this scenario can occur when one thread is customizing a MethodHandle, while another thread is executing the same MethodHandle. It seems possible for the second thread to see the partially initialized LambdaForm. We have a similar pattern in the constructor of MethodHandle, which also calls LF.prepare. But, since MH has final fields, I think we are saved by the barrier at the end of the constructor. But, I remember seeing some work to relax the barriers that are emitted at the end of the constructor on arm as well. I think for consistency we should put the store store barrier in LF.prepare after the vmentry field is written to. On a side note: I've been thinking that we should just eagerly initialize vmentry fields when constructing the lambda form, since that will prevent all these races (in some cases I've seen the LF being initialized up to 512 times, once per core, resulting in 511 discarded classes). The code was originally written so that a LF could be lazily compiled, and the vmentry could be switched part way through its execution, but that has been turned off for a long time. It also doesn't play well with some of the JIT compilers (see e.g. https://bugs.openjdk.org/browse/JDK-8300210 but Graal also has problems with that IIRC). I did some investigation with Claes to initialize LFs more lazily to improve startup, but it looks like leyden pre-generation of these classes is just better.
24-09-2024