JDK-8196969 : JTreg Failure: serviceability/sa/ClhsdbJstack.java causes NPE
  • Type: Bug
  • Component: hotspot
  • Sub-Component: svc-agent
  • Affected Version: 8u221,11,13,14
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2018-02-07
  • Updated: 2021-02-09
  • Resolved: 2019-10-11
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 13 JDK 14 Other
11.0.7Fixed 13.0.7Fixed 14 b19Fixed openjdk8u262Fixed
Related Reports
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
This test causes a NullPointerException on an AArch64 system (not necessarily unique to this platform, however) while running the JTreg test for serviceability/sa/ClhsdbJstack.java. 

It appears only during the run with -Xcomp flag. The Null Pointer Exception is in StackTrace.java. The cause is a null returned when getMethod() is called on a JavaVFrame. 

The .jtr is attached. 
Comments
Fix request (13u) Requesting backport to 13u for parity with 11u. The patch applies cleanly. Tested with tier1; new test fails without the patch, passes with it.
08-02-2021

I'm not sure what to do with this. Latest reply was: https://mail.openjdk.java.net/pipermail/jdk8u-dev/2020-April/011586.html
12-05-2020

Still on review.
20-02-2020

Fix Request (OpenJDK 8u): Please approve backporting this SA fix to OpenJDK 8u. Risk should be minimal as it only affects the serviceability agent code and that only for certain cases where a compiled frame is observed with a null decode offset (via "jstack -F"). JDK 11 patch applies as-is post path unshuffeling, but the test cannot be backported as major infrastructure is not there for 8u. I've posted a RFR and it got reviewed by Paul Hohensee. Manual reproducer passed testing. RFR: http://mail.openjdk.java.net/pipermail/jdk8u-dev/2019-November/010688.html webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8196969/jdk8/01/webrev/
02-12-2019

Fix Request (OpenJDK 11u): Please approve backporting this SA fix to OpenJDK 11u. Risk should be minimal as it only affects the serviceability agent code and that only for certain cases where a compiled frame is observed with a null decode offset (via "clhsdb jstack"). JDK 14 patch applies as-is. Regression test passes. Note that the regression test may introduce some noise in automated testing (as it's a stress test).
26-11-2019

URL: https://hg.openjdk.java.net/jdk/jdk/rev/516db52daad6 User: sgehwolf Date: 2019-10-11 11:46:29 +0000
11-10-2019

Latest webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8196969/04/webrev/
07-10-2019

This also seems to be the cause of the NPE failures described in JDK-8230872. I've confirmed that the above patch fixes them, even when not doing the Thread.sleep(2000), which is how the new HeapDumpTestWithActiveProcess.java test is run.
30-09-2019

RFR: http://mail.openjdk.java.net/pipermail/serviceability-dev/2019-September/029347.html
27-09-2019

I'm not sure why 0 decode offsets are being observed in the serviceability agent code, though. Either way, the proposed fix passes the stress test running 1000 times on a fastdebug JVM.
27-09-2019

The root cause of this seems to be that we are observing 0 decode offsets for some scopes, which never happens on the hotspot side of things as far I could tell with my traces. 0 offset is actually reserved for serialized null. Once we observe a serialized null scope and actually read sender, bci, method from that bogus decode offset things go south. One side effect of this is that getMethod() of CompiledVFrame returns null. Here is an example stack trace I was seeing when looking at this: "main" #1 prio=5 tid=0x00007fdbc8025800 nid=0x125fe runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE JavaThread state: _thread_in_java getScopeDescNearDbg(0x00007fdbb8228658): NMethod for TestRecursiveJstackHost.factorial(I)J==>nsun.jvm.hotspot.code.NMethod@0x00007fdbb8228490 PCDesc(0x00007fdbb822861f): decode offset: 0 @-1 reexecute=false @-1 reexecute=false ---> vframe: sun.jvm.hotspot.runtime.CompiledVFrame@49af8b80 ---> nm of vframe: NMethod for TestRecursiveJstackHost.factorial(I)J==>nsun.jvm.hotspot.code.NMethod@0x00007fdbb8228490 ---> method: sun.jvm.hotspot.oops.Method@0x00007fdb87c00328 - TestRecursiveJstackHost.factorial(int) @bci=0, line=5 (Compiled frame; information may be imprecise) ---> vframe: sun.jvm.hotspot.runtime.CompiledVFrame@ce6f88a8 ---> nm of vframe: NMethod for TestRecursiveJstackHost.factorial(I)J==>nsun.jvm.hotspot.code.NMethod@0x00007fdbb8228490 ---> method: null Error occurred during stack walking: java.lang.NullPointerException at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:89) at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45) at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.run(JStack.java:67) at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:260) at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:223) at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:118) at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.runWithArgs(JStack.java:90) at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJSTACK(SALauncher.java:290) at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:406)
26-09-2019

We are seeing this on x86_64. It's not arch specific. Does not seem to happen with -Xint. It seems an issue specific to CompiledVFrame.
19-09-2019

[~sballal][~lmesnik] if this isn't specific to aarch64 maybe we should clear the CPU field?
12-07-2018

Since it is the same bug as https://bugs.openjdk.java.net/browse/JDK-8145741 it means that it is not aarch64 specific but intermittently fail on all platforms.
06-06-2018

This appears to be an issue on the Hotspot side of things. Here is a snippet of the .jtr file where I've added some debug statements to help understand what is going on: "main" #1 prio=5 tid=0x0000ffff8c010000 nid=0x74f3 waiting on condition [0x0000ffff90b0e000] java.lang.Thread.State: TIMED_WAITING (sleeping) JavaThread state: _thread_blocked JavaThread::getLastJavaVFrameDbg(): enter JavaThread.java::getCurrentFrameGuess AARCH64CurrentFrameGuess.java::run(): fp = 281473109254368 AARCH64CurrentFrameGuess.java::run(): sp = 281473109254368 fp = 0x0000ffff90b0e0e0 (0x0000ffff90b0e1c0), sp = 0x0000ffff90b0e0e0, pc = 0x0000ffff91d0c018 CurrentFrameGuess: choosing last Java frame: sp = 0x0000ffff90b0e300, fp = null pc: 0x0000ffff7cdb6f88, fp: null, sp: 0x0000ffff90b0e300 FP is null. Found blob frame size 96 link_offset = 80 cb.FrameSize = 96, vm.AddressSize = 8 cb.getFrameCompleteOffset() = 48, cb.getSize() = 856, cb.getFrameSizeWords() = 12 fp = 0x0000ffff90b0e350 guesser.getPC() == null AARCH64Frame(sp, fp): sp: 0x0000ffff90b0e300, unextendedSP: 0x0000ffff90b0e300, fp: 0x0000ffff90b0e350, pc: 0x0000ffff751712c0 0x0000ffff90b0e2e0: 0x0000000000000001 0x0000ffff90b0e2e8: 0x0000ffff8c010f40 0x0000ffff90b0e2f0: 0x0000ffff90b0e300 0x0000ffff90b0e2f8: 0x0000ffff751712c0 ----------------------- 0x0000ffff90b0e300: 0x0000000101cbc0e0 0x0000ffff90b0e308: 0x0000000101cbbf40 0x0000ffff90b0e310: 0x0000ffff90b0e310 0x0000ffff90b0e318: 0x0000fffe451058b2 0x0000ffff90b0e320: 0x0000ffff90b0e398 0x0000ffff90b0e328: 0x0000fffe45117030 0x0000ffff90b0e330: 0x0000000101cb0a20 0x0000ffff90b0e338: null 0x0000ffff90b0e340: 0x0000000101c01cf8 0x0000ffff90b0e348: 0x0000fffe451058c8 0x0000ffff90b0e350: 0x0000ffff90b0e3f0 0x0000ffff90b0e358: 0x0000ffff751715a0 0x0000ffff90b0e360: 0x0000ffff90b0e3f0 0x0000ffff90b0e368: 0x0000ffff751715a0 0x0000ffff90b0e370: null 0x0000ffff90b0e378: 0x0000000101cbc0e0 0x0000ffff90b0e380: 0x0000000101cbbf40 0x0000ffff90b0e388: 0x00000161c447b020 0x0000ffff90b0e390: 0x00000000000003e8 0x0000ffff90b0e398: null 0x0000ffff90b0e3a0: 0x0000ffff90b0e3a0 0x0000ffff90b0e3a8: 0x0000fffe45106755 0x0000ffff90b0e3b0: 0x0000ffff90b0e410 0x0000ffff90b0e3b8: 0x0000fffe45117030 0x0000ffff90b0e3c0: 0x0000000101cb0a20 0x0000ffff90b0e3c8: 0x0000000101cb3c78 0x0000ffff90b0e3d0: 0x0000fffe45117c90 0x0000ffff90b0e3d8: 0x0000fffe451067a8 0x0000ffff90b0e3e0: 0x0000ffff90b0e390 0x0000ffff90b0e3e8: 0x0000ffff90b0e410 0x0000ffff90b0e3f0: 0x0000ffff90b0e4f0 JavaThread.java:getLastJavaVFrameDbg(): just got current frame guess. interpretedFrame? true AARCH64Frame.java::isInterpreedFrameValid(): INTERPRETER_FRAME_INITIAL_SP_OFFSET = -10, VM.getVM().getAddressSize() = 8, 0x0000ffff90b0e300 VFrame.java::newVFrame(): Returning new InterpretedVFrame JavaThread.java::getLastJavaVFrameDbg(): Returning from function: true, imprecise: true This is a test Inside InterpreterVFrame::getMethod() ** The method was null ** Is this a Java Frame? true Frame: sp: 0x0000ffff90b0e300, unextendedSP: 0x0000ffff90b0e300, fp: 0x0000ffff90b0e350, pc: 0x0000ffff751712c0 Is this an interpreted frame? true Is this a native frame? false Is this a Java frame? true Is this a Runtime frame? false Is this a First Frame? false Is this a deoptimized frame? false AARCH64Frame.java::isInterpreedFrameValid(): INTERPRETER_FRAME_INITIAL_SP_OFFSET = -10, VM.getVM().getAddressSize() = 8, 0x0000ffff90b0e300 Is interpreted frame valid? true Address at PC: 0x0000ffff751712c0 Is this an AArch64 Frame? true Address of Interpreter Frame Method: 0x0000ffff90b0e338 Note that this trace is for the test with "-Xcomp". The AARCH64CurrentFrameGuess.java::run() function determines that the PC is not in a Java frame and then grbs the last PC, SP, and FP from thread.getLastJavaXX(). The FP comes back as null and so the code tries to search a CodeBlob for the PC. It determines the FP (0x0000ffff90b0e350) from the CodeBlob size. The Method is -3 slots up from the FP (0x0000ffff90b0e338). This is null, as can be seen in the above trace. It is interesting that the same FP is found two slots later. (If you use the address found at -3 from that FP as Method (0x0000ffff90b0e348), it is the method jdk.test.lib.apps.LingeredApp.setLastModified(java.lang.String, long) ). I'm not sure what is going on, but this is 100% reproducible every time.
23-02-2018

Based on usage in ./jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/JavaVFrame.java getMethod() is never expected to return null. Need to check this on the hotspot side.
14-02-2018