JDK-8284045 : serviceability/sa/ClhsdbFindPC.java#xcomp-core is failing - sun.jvm.hotspot.debugger.UnalignedAddressException: 8baadbabe
  • Type: Bug
  • Component: hotspot
  • Sub-Component: svc-agent
  • Affected Version: 19,23
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2022-03-30
  • Updated: 2024-06-29
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Description
Right at the start of the test it fails:

Opening core file, please wait...
hsdb> hsdb> + verbose true
hsdb> + jstack -v
Deadlock Detection:

No deadlocks found.

"hsdb> + quit
sun.jvm.hotspot.debugger.UnalignedAddressException: 8baadbabe
	at jdk.hotspot.agent/sun.jvm.hotspot.debugger.windbg.WindbgDebuggerLocal$1.checkAlignment(WindbgDebuggerLocal.java:106)
	at jdk.hotspot.agent/sun.jvm.hotspot.debugger.DebuggerBase.readCInteger(DebuggerBase.java:357)
	at jdk.hotspot.agent/sun.jvm.hotspot.debugger.DebuggerBase.readAddressValue(DebuggerBase.java:462)
	at jdk.hotspot.agent/sun.jvm.hotspot.debugger.windbg.WindbgDebuggerLocal.readAddress(WindbgDebuggerLocal.java:312)
	at jdk.hotspot.agent/sun.jvm.hotspot.debugger.windbg.WindbgAddress.getAddressAt(WindbgAddress.java:71)
	at jdk.hotspot.agent/sun.jvm.hotspot.types.basic.BasicTypeDataBase.findDynamicTypeForAddress(BasicTypeDataBase.java:238)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VirtualBaseConstructor.instantiateWrapperFor(VirtualBaseConstructor.java:104)
	at jdk.hotspot.agent/sun.jvm.hotspot.oops.Metadata.instantiateWrapperFor(Metadata.java:77)
	at jdk.hotspot.agent/sun.jvm.hotspot.oops.Oop.getKlassForOopHandle(Oop.java:212)
	at jdk.hotspot.agent/sun.jvm.hotspot.oops.ObjectHeap.newOop(ObjectHeap.java:181)
	at jdk.hotspot.agent/sun.jvm.hotspot.oops.VMOopHandle.resolve(VMOopHandle.java:61)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.JavaThread.getThreadObj(JavaThread.java:355)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.JavaThread.getCurrentParkBlocker(JavaThread.java:407)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:80)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:62)
	at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor$27.doit(CommandProcessor.java:1153)
	at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.executeCommand(CommandProcessor.java:2212)
	at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.executeCommand(CommandProcessor.java:2182)
	at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.run(CommandProcessor.java:2053)
	at jdk.hotspot.agent/sun.jvm.hotspot.CLHSDB.run(CLHSDB.java:112)
	at jdk.hotspot.agent/sun.jvm.hotspot.CLHSDB.main(CLHSDB.java:44)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runCLHSDB(SALauncher.java:281)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:500)

This just started the past day. 5 failures already. The test task JVM args are:

-XX:+CreateCoredumpOnCrash -XX:+UseZGC

Virtual threads are not involved (no vthread wrapper).

Note the following two frames and the stack trace above them:

	at jdk.hotspot.agent/sun.jvm.hotspot.oops.Oop.getKlassForOopHandle(Oop.java:212)
	at jdk.hotspot.agent/sun.jvm.hotspot.oops.ObjectHeap.newOop(ObjectHeap.java:181)

This is the same as what we are seeing with JDK-8283578. However, JDK-8283578 is been going on for a while now, where as this issue just started in the past day. Also, JDK-8283578 is with -Xcomp test tasks, not ZGC.

Comments
Re-opening. The original CR as filed is not JDK-8318754, but all the recently reported failures are. Please do not file any failures on macos-aarch64 with this CR unless they are clearly not JDK-8318754.
28-06-2024

Closing as a dup of JDK-8318754.
28-06-2024

[~dcubed] there is a known issue with macosx-aarch64 core files that causes it to not include valid memory in the core file. We've previously seen this with the java heap, sections of the CDS map, and as it turns out, we have seen it with the memory pointed to _java_thread_list before (it just slipped my mind while looking at this issue). This doesn't just impact SA. If you try to dump this memory using lldb, it will tell you "error: core file does not contain <addr> "
06-06-2024

A given _java_thread_list value that is not the same as the current ThreadsSMRSupport::_java_thread_list value is only "stable" in a running JVM if there is a ThreadsListHandle somewhere in the running JVM that contains that same value. It is the existence of a containing ThreadsListHandle that protects that _java_thread_list value from being freed out from under the thread that wants to use that _java_thread_list value. I realize that this discussion migrated from a running JVM scenario to a core file scenario, but I just wanted to record this point about _java_thread_list values. For a core file, how it would be possible for the current value of ThreadsSMRSupport::_java_thread_list to be invalid. At the time the core file is generated that value should contain the current in-use ThreadList value.
06-06-2024

Sorry I missed the fact we were dealing with a core file here. It is interesting that core file generation is not reliable in this way.
05-06-2024

Looks like this is the macosx-aarch64 issue with core files not containing some addresses that certainly should be there. JDK-8293563 [macos-aarch64] SA core file tests failing with sun.jvm.hotspot.oops.UnknownOopException JDK-8314550 [macosx-aarch64] serviceability/sa/TestJmapCore.java fails with "sun.jvm.hotspot.debugger.UnmappedAddressException: 801000800" JDK-8318754 [macosx-aarch64] serviceability/sa/TestJmapCore.java still fails with "sun.jvm.hotspot.debugger.UnmappedAddressException: 801000800" JDK-8322148 UnmappedAddressException walking stacks in serviceability/sa/ClhsdbPstack.java#core JDK-8322148 is in fact identical to the recent failures in this CR. The memory pointed to by _java_thread_list is not accessible. Note this is not an SA issue. If you try to debug these core files with lldb, you will get "error: core file does not contain <addr> " when trying to dump memory that is missing from the core file.
04-06-2024

I think what you are suggesting might be possible on linux in the case of processes, because it looks like we need to iterate over all thread pids and attach to them one at a time. However, In this latest failure we are talking about a core file, and on OSX.
04-06-2024

Are all thread's known to be fully suspended or just logically suspended? Could we miss a thread that is terminating and which removes itself from the thread's list, such that it could free the _java_thread_list that has been read by the SA?
04-06-2024

As long as _java_thread_list points to a valid list, it should work for SA. All threads are suspended, so whatever SA reads in from _java_thread_list should be valid and remain valid for the duration that SA is attached. SA will not cache _java_thread_list once it detaches. When _java_thread_list is about to be updated but still points to the old thread list, is there anything invalid about _java_thread_list? Shouldn't it still be traversable. Keep in mind that the bug reported shows that SA failed to read in the page of memory that _java_thread_list points to. I don't see how this is possible if the update happens first, and then the old _java_thread_list is freed.
04-06-2024

[~cjplummer] there is no change. The list will be updated and then the old one freed. So if that happens at the same time that the debugger is reading then it can't work. The target VM has to be stable for the debugger to read arbitrary fields like this.
04-06-2024

HS locks and safepoints are of no benefit to SA. What matters is the order things are done in. If there is a free of _java_thread_list and then a new value is assigned, then that is a problem. Do you know when this change was introduced? Can it be cleaned up a bit to make SA happy?
04-06-2024

> For example, hotspot wouldn't do something like free the memory it is pointing to and then assign a ptr to newly allocated memory. Is that still the case? EDIT: scratch that. A list is still only freed when Threads_lock is held or at a safepoint. But if the debugger just reads the field whilst the target VM is still running, there is no guarantee it is the right value even if it hasn't been freed.
04-06-2024

[~dholmes] It's not clear what you mean by "suspended". It is certainly not guaranteed to be at any sort of safepoint. The threads are suspended by ptrace, which can suspend at any arbitrary suspension. My recollection was that in the past it was always safe to access _java_thread_list because it was not modified in a way that would expose an invalid address. For example, hotspot wouldn't do something like free the memory it is pointing to and then assign a ptr to newly allocated memory. Is that still the case?
04-06-2024

> Can _java_thread_list point to an invalid address? Sorry [~cjplummer] this slipped through the cracks. Yes it can point to an invalid address if you read it and then it was updated. But isn't the target VM suspended when this is read? If so then it should always be valid.
04-06-2024

[~dholmes][~dcubed] > This issue is tracking both UnalignedAddressException and UnmappedAddressException. The latter seems to be happening regularly now. So I guess this issue has morphed somewhat and needs a re-evaluation. [~cjplummer]. That's unfortunate as this UnmappedAddressException seems to be a very different issue. See stack trace below. As filed this CR was for the ZGC issue with not having a load barrier for OopHandles. I don't see any indication of that being the problem in the stack trace below. The issue seems to be with consistency of the TheadsList. This seems to have started on 2024-03-09 and we've seen it 6 times now. From what I can tell of the SA source, it is trying to do the hotspot equivalent of ThreadsSMRSupport->_java_thread_list->_length. Did anything change in this area? Can _java_thread_list point to an invalid address? sun.jvm.hotspot.debugger.UnmappedAddressException: 60000a6bbe54 at jdk.hotspot.agent/sun.jvm.hotspot.debugger.PageCache.checkPage(PageCache.java:208) at jdk.hotspot.agent/sun.jvm.hotspot.debugger.PageCache.getInt(PageCache.java:96) at jdk.hotspot.agent/sun.jvm.hotspot.debugger.DebuggerBase.readCInteger(DebuggerBase.java:355) at jdk.hotspot.agent/sun.jvm.hotspot.debugger.bsd.BsdAddress.getCIntegerAt(BsdAddress.java:68) at jdk.hotspot.agent/sun.jvm.hotspot.types.basic.BasicField.getCInteger(BasicField.java:162) at jdk.hotspot.agent/sun.jvm.hotspot.types.basic.BasicCIntegerField.getValue(BasicCIntegerField.java:54) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.ThreadsList.length(Threads.java:67) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Threads.getNumberOfThreads(Threads.java:187) at jdk.hotspot.agent/sun.jvm.hotspot.utilities.PointerFinder.find(PointerFinder.java:74) at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor$9.doit(CommandProcessor.java:686) at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.executeCommand(CommandProcessor.java:2230) at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.executeCommand(CommandProcessor.java:2200) at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.run(CommandProcessor.java:2071) at jdk.hotspot.agent/sun.jvm.hotspot.CLHSDB.run(CLHSDB.java:112) at jdk.hotspot.agent/sun.jvm.hotspot.CLHSDB.main(CLHSDB.java:44) at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runCLHSDB(SALauncher.java:281) at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:500)
13-05-2024

Removed mention of ZGC from the tile. This issue is tracking both UnalignedAddressException and UnmappedAddressException. The latter seems to be happening regularly now. So I guess this issue has morphed somewhat and needs a re-evaluation. [~cjplummer].
09-05-2024

Note, although this is turning in in the loom repo, it is not a loom specific issue. I believe it is turning up in loom because of the changes in loom that cause the codecache sweeper to initiate a GC. Once a GC happens, the issues seen by this CR are more likely. Note it was initially turning up with -Xcomp runs (and therefore more likely the sweeper triggers a GC), but I made it turn up in all runs by forcing a GC. So I'm pretty sure this could happen in mainline JDK also.
31-03-2022

> OopHandle requires a load barrier to resolve the embedded oop with ZGC, which the SA doesn't really support. That might be why this crashes. Yes, this seemst to be the problem. It looks like there are only two places where VMOopHandles are accessed in SA, and therefore a load barrier is needed: In ClassLoaderData.java to fetch the ClassLoader instance: public Oop getClassLoader() { Address addr = getAddress().addOffsetTo(classLoaderFieldOffset); VMOopHandle vmOopHandle = VMObjectFactory.newObject(VMOopHandle.class, addr); return vmOopHandle.resolve(); } In JavaThread.java to fetch the Thread instance: /** Gets the Java-side thread object for this JavaThread */ public Oop getThreadObj() { Oop obj = null; try { Address addr = getAddress().addOffsetTo(threadObjFieldOffset); VMOopHandle vmOopHandle = VMObjectFactory.newObject(VMOopHandle.class, addr); obj = vmOopHandle.resolve(); } catch (Exception e) { e.printStackTrace(); } return obj; } The latter seems to be the one causing JDK-8284045.
31-03-2022

It seems to reproduce much more readily, and with all 4 variants of ClhsdbFindPC.java, if the following patch is applied. This seemed counter intuitive at first, since the goal of this patch was too bring the debuggee to a more stable state, but it turns out forcing the GC leaves more oops needing to go through load barriers. diff --git a/test/lib/jdk/test/lib/apps/LingeredApp.java b/test/lib/jdk/test/lib/apps/LingeredApp.java index 01c15b3562e..b038ca6df61 100644 --- a/test/lib/jdk/test/lib/apps/LingeredApp.java +++ b/test/lib/jdk/test/lib/apps/LingeredApp.java @@ -588,6 +588,13 @@ public class LingeredApp { Object steadyStateObj = new Object(); synchronized(steadyStateObj) { startSteadyStateThread(steadyStateObj); + System.gc(); + System.gc(); + System.gc(); + try { + Thread.sleep(5000); + } catch (InterruptedException e) { + } if (forceCrash) { System.loadLibrary("LingeredApp"); // location of native crash() method crash();
31-03-2022

OopHandle requires a load barrier to resolve the embedded oop with ZGC, which the SA doesn't really support. That might be why this crashes.
31-03-2022