JDK-8313798 : [aarch64] sun/tools/jhsdb/HeapDumpTestWithActiveProcess.java sometimes times out on aarch64
  • Type: Bug
  • Component: hotspot
  • Sub-Component: svc-agent
  • Affected Version: 22
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • CPU: aarch64
  • Submitted: 2023-08-04
  • Updated: 2023-08-31
  • Resolved: 2023-08-11
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 22
22 b11Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
I started running into the following issue after implementing JDK-8307408, which changes the arguments used to launch the debuggee.

sun/tools/jhsdb/HeapDumpTestWithActiveProcess.java sometimes times out on aarch64. I'm mostly noting this issue on OSX, but suspect it may also sometimes happen in linux. Although the stack trace of the process varies, the following frames are always present when the process times out:

	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.sender(VFrame.java:126)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.javaSender(VFrame.java:156)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.ThreadStackTrace.dumpStack(ThreadStackTrace.java:54)
	at jdk.hotspot.agent/sun.jvm.hotspot.utilities.HeapHprofBinWriter.dumpStackTraces(HeapHprofBinWriter.java:836)
	at jdk.hotspot.agent/sun.jvm.hotspot.utilities.HeapHprofBinWriter.write(HeapHprofBinWriter.java:460)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.JMap.writeHeapHprofBin(JMap.java:216)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.JMap.run(JMap.java:103)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:278)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:241)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:134)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.JMap.main(JMap.java:202)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJMAP(SALauncher.java:340)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:500)

It's actually stuck in VFrame.javaSender(). The issue is that no sanity check is done on the frame being looked at, so the following is at risk of getting stuck in an infinite loop if at some point what is suppose to be the sender frame is actually at a lower address instead of a higher address:

    VFrame f = sender(imprecise);
    while (f != null) {
      if (f.isJavaFrame()) {
        return (JavaVFrame) f;
      }
      f = f.sender(imprecise);
    }

The following code fixes the issue with an extra check:

    VFrame f = sender(imprecise);
    while (f != null) {
      if (f.isJavaFrame()) {
        return (JavaVFrame) f;
      }
      Address oldSP = f.getFrame().getSP();
      f = f.sender(imprecise);
      if (f != null) {
          Address newSP = f.getFrame().getSP();
          if (oldSP.greaterThanOrEqual(newSP)) {
              String errString = "newSP(" + newSP + ") is not above oldSP(" + oldSP + ")";
              System.out.println(errString);
              throw new RuntimeException(errString);
          }
      }
    }

Note this issue is very similar to JDK-8231635, but is different stack walking code. The difference is that the JDK-8231635 stack walking is using Frame.sender() and here we using VFrame.sender(). I'm not too clear on the distinction between these two different approaches to stack walking. In any case, the JDK-8231635 fix is similar to what I'm suggesting here, which is to santity check that sender frames are always at a higher address.

Comments
If you feel this is also the cause of JDK-8313800, it would probably be best to fix using that CR.
31-08-2023

And Indeed, I just checked, and the method handle code where we go into the weeds uses FP as a temporary register, so its saved value is overwritten. I think we'd be fine with the same definition of Frame::adjustUnextendedSP() as x86.
31-08-2023

I have found the root cause of this problem. It is due to AArch64Frame::adjustUnextendedSP(), which is wrong. It does this: // If the sender PC is a deoptimization point, get the original // PC. For MethodHandle call site the unextended_sp is stored in // saved_fp. if (senderNm.isDeoptMhEntry(getPC())) { // DEBUG_ONLY(verifyDeoptMhOriginalPc(senderNm, getFP())); raw_unextendedSP = getFP(); } else if (senderNm.isDeoptEntry(getPC())) { // DEBUG_ONLY(verifyDeoptOriginalPc(senderNm, raw_unextendedSp)); } else if (senderNm.isMethodHandleReturn(getPC())) { raw_unextendedSP = getFP(); } Unfortunately we don't use frame pointers any more, so FP can point to anywhere at all. x86 doesn't do any special handling here for method handle intrinsics, and I don't think we should do so on AArch64 either. I just looked at the AArch64 frame-handling code, and it's stuff I wrote in 2015 based on x86. Looking back at the history of this, I see that the AArch64 port missed this patch: 8068945: Use RBP register as proper frame pointer in JIT https://bugs.openjdk.org/browse/JDK-8068945 ... which removed the use of FP in AArch64Frame::adjustUnextendedSP().
31-08-2023

Changeset: 8f1c1348 Author: Chris Plummer <cjplummer@openjdk.org> Date: 2023-08-11 18:09:44 +0000 URL: https://git.openjdk.org/jdk/commit/8f1c134848437d7e37fb3b4bd603b91798e19724
11-08-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/15183 Date: 2023-08-07 20:17:21 +0000
07-08-2023

I think JDK-8276210 is likely a duplicate of this CR.
04-08-2023