JDK-8313800 : AArch64: SA stack walking code having trouble finding sender frame when invoking LambdaForms is involved
  • Type: Bug
  • Component: hotspot
  • Sub-Component: svc-agent
  • Affected Version: 22
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • CPU: aarch64
  • Submitted: 2023-08-04
  • Updated: 2023-09-18
  • Resolved: 2023-09-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 22
22 b15Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
The issue that turned up with JDK-8313798 (stack walking infinite loop) sem to be related to stack walking when LambdaForms are involved. I've noticed that when this JDK-8313798 timeout happenes, the debuggee always seems to have the following stack:

THREAD: main
    Method java/lang/ClassLoader.defineClass0(Ljava/lang/ClassLoader;Ljava/lang/Class;Ljava/lang/String;[BIILjava/security/ProtectionDomain;ZILjava/lang/Object;)Ljava/lang/Class;@0x00000008004b4c00
    Method java/lang/System$2.defineClass(Ljava/lang/ClassLoader;Ljava/lang/Class;Ljava/lang/String;[BLjava/security/ProtectionDomain;ZILjava/lang/Object;)Ljava/lang/Class;@0x000000080001c3c0
    Method java/lang/invoke/MethodHandles$Lookup$ClassDefiner.defineClass(ZLjava/lang/Object;)Ljava/lang/Class;@0x0000000800220c48
    Method java/lang/invoke/InnerClassLambdaMetafactory.generateInnerClass()Ljava/lang/Class;@0x000000080027c118
    Method java/lang/invoke/InnerClassLambdaMetafactory.spinInnerClass()Ljava/lang/Class;@0x000000080027c0a8
    Method java/lang/invoke/InnerClassLambdaMetafactory.buildCallSite()Ljava/lang/invoke/CallSite;@0x000000080027a938
    Method java/lang/invoke/LambdaMetafactory.metafactory(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodHandle;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/CallSite;@0x000000080044cdc8
    Method java/lang/invoke/LambdaForm$DMH+0x00000008010a5000.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;@0x000000013360c340
    Method java/lang/invoke/Invokers$Holder.invokeExact_MT(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;@0x00000008000c3e80
    Method java/lang/invoke/BootstrapMethodInvoker.invoke(Ljava/lang/Class;Ljava/lang/invoke/MethodHandle;Ljava/lang/String;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Class;)Ljava/lang/Object;@0x0000000800187638
newSP(0x0000000122808220) is not above oldSP(0x000000016db520f0)
Error occurred during stack walking:
java.lang.RuntimeException: newSP(0x0000000122808220) is not above oldSP(0x000000016db520f0)

There are also other exceptions being thrown with similar stack traces (they didn't cause the JDK-8313798 timeout because an exception was thrown instead of getting in an infinite loop):

THREAD: main
    Method java/lang/ClassLoader.defineClass0(Ljava/lang/ClassLoader;Ljava/lang/Class;Ljava/lang/String;[BIILjava/security/ProtectionDomain;ZILjava/lang/Object;)Ljava/lang/Class;@0x00000070004b4c00
    Method java/lang/System$2.defineClass(Ljava/lang/ClassLoader;Ljava/lang/Class;Ljava/lang/String;[BLjava/security/ProtectionDomain;ZILjava/lang/Object;)Ljava/lang/Class;@0x000000700001c3c0
    Method java/lang/invoke/MethodHandles$Lookup$ClassDefiner.defineClass(ZLjava/lang/Object;)Ljava/lang/Class;@0x0000007000220c48
    Method java/lang/invoke/InnerClassLambdaMetafactory.generateInnerClass()Ljava/lang/Class;@0x000000700027c118
    Method java/lang/invoke/InnerClassLambdaMetafactory.spinInnerClass()Ljava/lang/Class;@0x000000700027c0a8
    Method java/lang/invoke/InnerClassLambdaMetafactory.buildCallSite()Ljava/lang/invoke/CallSite;@0x000000700027a938
    Method java/lang/invoke/LambdaMetafactory.metafactory(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodHandle;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/CallSite;@0x000000700044cdc8
    Method java/lang/invoke/LambdaForm$DMH+0x00000070010a5000.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;@0x000000012760c340
Error occurred during stack walking:
java.lang.NullPointerException: Cannot invoke "sun.jvm.hotspot.debugger.Address.addOffsetTo(long)" because the return value of "sun.jvm.hotspot.runtime.Frame.getFP()" is null

THREAD: main
    Method java/lang/ClassLoader.defineClass0(Ljava/lang/ClassLoader;Ljava/lang/Class;Ljava/lang/String;[BIILjava/security/ProtectionDomain;ZILjava/lang/Object;)Ljava/lang/Class;@0x00000004004b4c00
    Method java/lang/System$2.defineClass(Ljava/lang/ClassLoader;Ljava/lang/Class;Ljava/lang/String;[BLjava/security/ProtectionDomain;ZILjava/lang/Object;)Ljava/lang/Class;@0x000000040001c3c0
    Method java/lang/invoke/MethodHandles$Lookup$ClassDefiner.defineClass(ZLjava/lang/Object;)Ljava/lang/Class;@0x0000000400220c48
    Method java/lang/invoke/InnerClassLambdaMetafactory.generateInnerClass()Ljava/lang/Class;@0x000000040027c118
    Method java/lang/invoke/InnerClassLambdaMetafactory.spinInnerClass()Ljava/lang/Class;@0x000000040027c0a8
    Method java/lang/invoke/InnerClassLambdaMetafactory.buildCallSite()Ljava/lang/invoke/CallSite;@0x000000040027a938
    Method java/lang/invoke/LambdaMetafactory.metafactory(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodHandle;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/CallSite;@0x000000040044cdc8
    Method java/lang/invoke/LambdaForm$DMH+0x00000004010a1000.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;@0x000000012860c340
    Method java/lang/invoke/Invokers$Holder.invokeExact_MT(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;@0x00000004000c3e80
Error occurred during stack walking:
sun.jvm.hotspot.debugger.UnalignedAddressException: Trying to read at address: 0x00000068e109ce09 with alignment: 8

 Note the test passes whenever an exception is thrown, because the test knows that SA can't handle generating a stack trace in certain situations (while in the middle of a frame push for example). However, the fact that these failures all seem to be happening while invoking LamdaForms is suspicious. This could be related to JDK-8276210 which noted issues when the debuggee stack looked like the following:

  - jdk.internal.misc.Unsafe.allocateInstance(java.lang.Class) @bci=0 (Compiled frame; information may be imprecise)
 - java.lang.invoke.DirectMethodHandle.allocateInstance(java.lang.Object) @bci=12, line=492 (Compiled frame)
 - java.lang.invoke.DirectMethodHandle$Holder.newInvokeSpecial(java.lang.Object, java.lang.Object) @bci=1 (Compiled frame)
 - java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) @bci=5 (Compiled frame) 

Note, usually stack walking issues with an active thread are with the topmost frame (the first frame that is visited), because it might be in an inconsistent state (not fully pushed or popped). If the state of the first frame is valid, then walking the rest of the stack should have no issues. But in all the above cases we eventually run into an issue with a frame higher up the stack, so this suggests that the stack walking code is broken in certain situations. Basically there is a frame somewhere in the middle of the stack that the stack walking code doesn't know how to get past. This seems to be unique to aarch64.

Comments
Changeset: 1d702d28 Author: Andrew Haley <aph@openjdk.org> Date: 2023-09-12 16:49:55 +0000 URL: https://git.openjdk.org/jdk/commit/1d702d28b687add53762435abceb55f4dc2d37e2
12-09-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/15624 Date: 2023-09-07 16:58:38 +0000
07-09-2023

There are a couple of other related bugs you might also want to try to reproduce. I think you just need to remove them from the problem list: JDK-8276210 - Test sun/tools/jhsdb/JStackStressTest.java timed out sun/tools/jhsdb/JStackStressTest.java Currently problem listed: sun/tools/jhsdb/JStackStressTest.java 8276210 linux-aarch64 JDK-8248675 - [aarch64] serviceability/sa/TestJhsdbJstackMixed.java fails with "Exception: sun.jvm.hotspot.debugger.UnmappedAddressException: e0a646adbd850" serviceability/sa/TestJhsdbJstackMixed.java Currently problem listed for -Xcomp: serviceability/sa/TestJhsdbJstackMixed.java 8248675 linux-aarch64
17-08-2023

Run sun/tools/jhsdb/HeapDumpTestWithActiveProcess.java, but you'll need to modify it to fail when exceptions are thrown while generating the hprof stack traces. Probably the best way to do this is it have jmap fail when if there is an exception while it is producing the thread stack traces. Right now it it does not. To do this modify sun.jvm.hotspot.runtime.ThreadStackTrace.dumpStack() so it rethrows the exception. You might also want to add the diff above so you can see the debuggee stack trace (up to the failure point) when the exception was thrown. You should see a variety of different exceptions on aarch64, and they should be pretty common. Note, ignore any issues that look like they originate with the first frame, since this is likely a case of trying to get the stack trace in the middle of a push or pop of the frame.
17-08-2023

What's the quickest way to reproduce this?
17-08-2023

Feel free to. I have no plans to look at it.
14-08-2023

Hmm, I guess I should have a look at this.
10-08-2023

Just to clarify, the above stack traces are for the debuggee, not SA. They show the debuggee stack at the point where SA had trouble walking it. The following diff was used to produce the above debuggee stack traces: diff --git a/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/ThreadStackTrace.java b/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/ThreadStackTrace.java index 6d3a8109f1b..c93cc55cf67 100644 --- a/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/ThreadStackTrace.java +++ b/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/ThreadStackTrace.java @@ -49,9 +49,13 @@ public class ThreadStackTrace { if (!thread.isJavaThread()) { return; } + System.out.println("THREAD: " + thread.getThreadName()); try { for (JavaVFrame vf = thread.getLastJavaVFrameDbg(); vf != null; vf = vf.javaSender()) { StackFrameInfo frame = new StackFrameInfo(vf); + System.out.print(" "); + frame.getMethod().printValueOn(System.out); + System.out.println(); frames.add(frame); depth++; @@ -62,7 +66,10 @@ public class ThreadStackTrace { } } catch (Exception e) { System.out.println("Error occurred during stack walking:"); - e.printStackTrace(); + e.printStackTrace(System.out); } } }
04-08-2023