Bug ID: JDK-8255452 Doing GC during JVMTI MethodExit event posting breaks return oop

JDK-8255452 : Doing GC during JVMTI MethodExit event posting breaks return oop

Type: Bug
Component: hotspot
Sub-Component: jvmti
Affected Version: 11,15,16

Priority: P4
Status: Resolved
Resolution: Fixed
OS: linux

Submitted: 2020-10-27
Updated: 2024-12-09
Resolved: 2020-11-05

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 11	JDK 16
11.0.13Fixed	16 b24Fixed

Related Reports

Relates :

JDK-8255459 - Shenandoah GC crashes on forceEarlyReturn006

Description

While stress testing some JVMTI tagmap table changes, we've found a few crashes with stack traces similar to this:

#  assert(Universe::heap()->is_in_or_null(r)) failed: bad receiver: 0x00000800014401b0 (8796114256304)
...
Stack: [0x00007fad771f1000,0x00007fad772f2000],  sp=0x00007fad772ebef0,  free space=1003k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xd40b44]  HandleArea::allocate_handle(oop)+0x144
V  [libjvm.so+0x16c7b86]  SharedRuntime::find_callee_info_helper(JavaThread*, vframeStream&, Bytecodes::Code&, CallInfo&, Thread*)+0x796
V  [libjvm.so+0x16cbf5a]  SharedRuntime::handle_ic_miss_helper(JavaThread*, Thread*)+0xfa
V  [libjvm.so+0x16cccfe]  SharedRuntime::handle_wrong_method_ic_miss(JavaThread*)+0x1ce
v  ~RuntimeStub::ic_miss_stub
J 413 c1 java.lang.StringConcatHelper.stringOf(Ljava/lang/Object;)Ljava/lang/String; java.base@16-internal (20 bytes) @ 0x00007fb0491fcba4 [0x00007fb0491fca60+0x0000000000000144]
j  java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+10 java.base@16-internal
j  java.lang.invoke.LambdaForm$MH+0x0000000800c17400.invoke(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+28 java.base@16-internal
j  java.lang.invoke.Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+7 java.base@16-internal
j  nsk.share.jpda.ForceEarlyReturnTestThread.executeMethod(Ljava/lang/String;)V+983
j  nsk.share.jpda.ForceEarlyReturnTestThread.run()V+69
v  ~StubRoutines::call_stub
V  [libjvm.so+0xe39105]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x595
V  [libjvm.so+0xe39985]  JavaCalls::call_virtual(JavaValue*, Klass*, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x4c5
V  [libjvm.so+0xe39e2c]  JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, Thread*)+0xac
V  [libjvm.so+0xfcafeb]  thread_entry(JavaThread*, Thread*)+0x12b
V  [libjvm.so+0x185a396]  JavaThread::thread_main_inner()+0x256
V  [libjvm.so+0x18611c0]  Thread::call_run()+0x100
V  [libjvm.so+0x154f746]  thread_native_entry(Thread*)+0x116

This can be reproduced with ZGC with the following command line:
while true; do make --no-print-directory -C ../build/fastdebug/ test TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/MethodExitEvent/returnValue/returnValue003/returnValue003.java JTREG="JAVA_OPTIONS=-XX:+UseZGC -Xmx2g -XX:ZCollectionInterval=0.0001 -XX:ZFragmentationLimit=0.01 -XX:-ShowMessageBoxOnError" JTREG_EXTRA_PROBLEM_LISTS=ProblemList-zgc.txt; done

Or with G1 and Serial by running a jcmd loop to externally trigger System.gc:
while true; do jcmd nsk.jdi.MethodExitEvent.returnValue.returnValue003.returnValue003a GC.run; done

While investigating the G1 crash I saw a crash here:
   0x00007f18489bc3f5:	addq   $0x1,0x30(%r14)
   0x00007f18489bc3fa:	sbbq   $0x0,0x30(%r14)
   0x00007f18489bc3ff:	jmpq   0x7f18489bc411
   0x00007f18489bc404:	mov    %rax,0x18(%r14)
   0x00007f18489bc408:	mov    $0x1,%edx
   0x00007f18489bc40d:	mov    %rdx,0x20(%r14)
   0x00007f18489bc411:	add    $0x38,%r14
   0x00007f18489bc415:	mov    %r14,-0x28(%rbp)
=> 0x00007f18489bc419:	mov    0x1e0(%rax,%rbx,8),%rbx
   0x00007f18489bc421:	mov    -0x28(%rbp),%rdx
   0x00007f18489bc425:	test   %rdx,%rdx
   0x00007f18489bc428:	je     0x7f18489bc5a9
   0x00007f18489bc42e:	cmpb   $0xb,-0x38(%rdx)
   0x00007f18489bc432:	jne    0x7f18489bc5a9

With a broken rax:
(gdb) p /x $rax
$2 = 0xdd56dd5f0

Coming from a decoding of  a compressed klass pointer:
   0x00007f18489bc363:	mov    0x8(%rcx),%eax
   0x00007f18489bc366:	shl    $0x3,%rax
   0x00007f18489bc36a:	movabs $0x800000000,%r10
   0x00007f18489bc374:	add    %r10,%rax
   0x00007f18489bc377:	mov    -0x28(%rbp),%r14
   0x00007f18489bc37b:	test   %r14,%r14
   0x00007f18489bc37e:	je     0x7f18489bc419

Reversing decoding of rax gives:
(gdb) p /x 0xdd56dd5f0 - 0x800000000
$6 = 0x5d56dd5f0
(gdb) p /x (0xdd56dd5f0 - 0x800000000) >> 3
$7 = 0xbaadbabe

The object is in rcx:
  0x00007f18489bc363:	mov    0x8(%rcx),%eax

Whichs contains the infamouss baadbabe:
(gdb) x /x $rcx
0x80502218:	0xbaadbabebaadbabe

Comments

Fix Request (11u): 11u is affected, too, as stated above. Fix applies almost cleanly. Review: https://github.com/openjdk/jdk11u-dev/pull/110
08-07-2021
Changeset: 3a02578b Author: Erik Österlund <eosterlund@openjdk.org> Date: 2020-11-05 16:17:33 +0000 URL: https://github.com/openjdk/jdk/commit/3a02578b
05-11-2020
Hi Stefan and Erik, I'm glad you found it. Proposed fix makes sense. Looks like older JDK versions are also affected and the recent changes just uncovered the issue. Note that we have seen broken Oops in forceEarlyReturn007 JCK test in JDK11u on s390.
02-11-2020
Hi Erik, above I was indeed mistaken. Your reasoning is plausible to me. Thanks, Richard.
28-10-2020
I feel pretty certain I have found the real issue though. The imasm::remove_activation() call does not deal with safepoints very well. However, when the MethodExit JVMTI event is being called, we call into the runtime in the middle of remove_activation(). If the value being returned is an object type, then the top-of-stack contains the oop. However, the GC does not traverse said oop in any oop map, because it is simply not expected that we safepoint in the middle of remove_activation(). The JvmtiExport::post_method_exit() function we end up calling, reads the top-of-stack oop, and puts it in a handle. Then it calls JVMTI callbacks, that eventually call Java and a bunch of stuff that safepoints. So after the JVMTI callback, we can expect the top-of-stack oop to be broken. Unfortunately, when we continue, we therefore end up returning a broken oop. I tried an experiment where I would write back the oop of the handle (that captured the return oop) to the top-of-stack. This made my reproducer not reproduce at all, from reproducing approximately 100% with the following reproducer: while true; do make test JTREG="RETAIN=all" TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/MethodExitEvent/returnValue/returnValue003/returnValue003.java TEST_OPTS_JAVA_OPTIONS="-XX:+UseZGC -Xmx2g -XX:ZCollectionInterval=0.0001 -XX:ZFragmentationLimit=0.01 -XX:+VerifyOops -XX:+ZVerifyViews -Xint" ; done Naturally, in order to fix this properly, I think we need InterpreterRuntime::post_method_exit to change from JRT_ENTRY to JRT_BLOCK_ENTRY. Then we can stash away the return oop before a JRT_BLOCK that transitions from Java to VM. And when we leave with JRT_END (after the transition back to Java), we can restore a not broken oop in the stack, so that we return the oop properly.
28-10-2020
Thanks for the comments Richard. I don't think that is the issue though. We seem to read the oop from the JVMTI state first in the handshake. And then it is already poked to be processed before the handshake is run.
28-10-2020
JvmtiEnvBase::force_early_return() calls JvmtiThreadState::state_for(java_thread) at [1]. There we can safepoint locking JvmtiThreadState_lock at [2]. This invalidates the jvalue value parameter to force_early_return() if it is an oop. This could cause the issues if I'm not mistaken. [1] https://github.com/openjdk/jdk/blob/aaf4f690f06ee3a98b69bfb300cb01d17142a056/src/hotspot/share/prims/jvmtiEnvBase.cpp#L1373 [2] https://github.com/openjdk/jdk/blob/aaf4f690f06ee3a98b69bfb300cb01d17142a056/src/hotspot/share/prims/jvmtiThreadState.inline.hpp#L89
28-10-2020
[~rrich] This issue reproduces before and after the escape analysis changes but some of these crashes are in deoptimization coming from the new code now.
27-10-2020