JDK-8293563 : [macos-aarch64] SA core file tests failing with sun.jvm.hotspot.oops.UnknownOopException
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 17,18,20
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2022-09-08
  • Updated: 2024-05-07
  • Resolved: 2022-09-28
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 17 JDK 20
11.0.24Fixed 17.0.10-oracleFixed 20 b18Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
This is a replacement CR for JDK-8269982, which had to be made confidential, and also which has changed behavior, so I thought it best to start from scratch.

JDK-8269982 originally described a sun.jvm.hotspot.debugger.UnmappedAddressException when running the following 4 core files tests:

serviceability/sa/ClhsdbPstack.java#id1
serviceability/sa/ClhsdbFindPC.java#id3
serviceability/sa/ClhsdbFindPC.java#id1
serviceability/sa/ClhsdbCDSCore.java 

These tests are now called:

serviceability/sa/ClhsdbPstack.java#core
serviceability/sa/ClhsdbFindPC.java#xcomp-core
serviceability/sa/ClhsdbFindPC.java#no-xcomp-core
serviceability/sa/ClhsdbCDSCore.java

Also, the UnmappedAddressException has been replaced by UnknownOopException, but the stack trace is basically the same:

sun.jvm.hotspot.oops.UnknownOopException
at jdk.hotspot.agent/sun.jvm.hotspot.oops.ObjectHeap.newOop(ObjectHeap.java:193)
at jdk.hotspot.agent/sun.jvm.hotspot.oops.VMOopHandle.resolve(VMOopHandle.java:61)
at jdk.hotspot.agent/sun.jvm.hotspot.runtime.JavaThread.getThreadObj(JavaThread.java:353)
at jdk.hotspot.agent/sun.jvm.hotspot.runtime.JavaThread.getCurrentParkBlocker(JavaThread.java:405)
at jdk.hotspot.agent/sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:82)
at jdk.hotspot.agent/sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39)
at jdk.hotspot.agent/sun.jvm.hotspot.tools.PStack.run(PStack.java:79)
at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor$30.doit(CommandProcessor.java:1124)
at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.executeCommand(CommandProcessor.java:2090)
at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.executeCommand(CommandProcessor.java:2060)
at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.run(CommandProcessor.java:1931)
at jdk.hotspot.agent/sun.jvm.hotspot.CLHSDB.run(CLHSDB.java:112)
at jdk.hotspot.agent/sun.jvm.hotspot.CLHSDB.main(CLHSDB.java:44)
at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runCLHSDB(SALauncher.java:281)
at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:500)

However, due to JDK-8293006, which was pushed about a week ago, the above exception is caught and not considered fatal since it can happen when you are in the middle of a GC. So instead in the log you will see the following warning:

WARNING: could not get Thread object: sun.jvm.hotspot.oops.UnknownOopException: 0x00000007dff299d0

Because of this warning, we cannot get the name of each thread being dumped, so the tests can fail in different ways. ClhsdbPStack fails with:

java.lang.RuntimeException: Test ERROR java.lang.RuntimeException: 'Common-Cleaner' missing from stdout/stderr

Lastly, even getting as far as the above failure takes some work, because getting core files on macOS Monterey has it's challenges, especially on aarch64. First make sure developer security mode is enabled:

$ sudo DevToolsSecurity -enable

Next make sure /cores is writable:

$  sudo chmod 1777 /cores

Next, until JDK-8293550 is fixed, you need to manually add the com.apple.security.get-task-allow entitlement after the build complete (and do this every time hotspot is rebuild):

$ /usr/libexec/PlistBuddy -c "Add :com.apple.security.get-task-allow bool true" tmp.entitlements

$ codesign -s - -f --entitlements tmp.entitlements build/macosx-aarch64-debug/images/jdk/bin/java
Comments
Fix request [11u] Requesting backport of this test-only fix to get rid of a test error that we observe in our CI. Risk is low. Change was not clean, had to be adapted and was reviewed by Matthias. GHA and SAP nightlies pass.
07-05-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk11u-dev/pull/2698 Date: 2024-05-06 07:45:41 +0000
06-05-2024

Fix request [17u] I backport this for parity with 17.0.10-oracle. No risk, only a test change. Clean backport. Tests pass. SAP nightly testing passed.
24-10-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk17u-dev/pull/1903 Date: 2023-10-23 14:42:36 +0000
23-10-2023

This appears to be a macosx-aarch64 bug. I was able to reproduce the issue with the following standalone program: $ cat test.c #include <stdio.h> #include <sys/mman.h> int main() { size_t K = 1024; size_t M = K*K; size_t size = 1024*M ; char* addr = (char*)mmap(0,size, PROT_NONE, 0x1042, -1, 0); int result = mprotect(addr, size, PROT_READ|PROT_WRITE); printf("mmap = %p, last addr = %p, mprotect = %d \n", addr, addr + size, result); return *(volatile int*)0; } $ gcc test.c $ cat tmp.entitlements <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>com.apple.security.get-task-allow</key> <true/> </dict> </plist> $codesign -s - -f --entitlements tmp.entitlements a.out a.out: replacing existing signature $ ./a.out mmap = 0x280000000, last addr = 0x2c0000000, mprotect = 0 Segmentation fault: 11 (core dumped) $ lldb --core /cores/core.24697 a.out (lldb) target create "a.out" --core "/cores/core.24697" Core file '/cores/core.24697' (arm64) was loaded. (lldb) x 0x280000000 error: core file does not contain 0x280000000 The issue doesn't always reproduce. It seems if mmap returns a region with a nice round address (0x280000000), then the problem reproduces. If it returns something like 0x100698000, then it won't. Also, if you change the test to write to the first byte of memory, then that address will appear in the core file, along with the first 0x8000000 bytes. Addresses after this range will not be. Using the -XX:+AlwaysPreTouch option works around the problem, although it is not clear why this makes a difference since the memory we are not seeing dumped to the core file is heap memory that must have been touched at some point.
24-08-2023

Changeset: 76f18651 Author: Chris Plummer <cjplummer@openjdk.org> Date: 2022-09-28 20:35:33 +0000 URL: https://git.openjdk.org/jdk/commit/76f1865124025ee6f788a50e1377b82fd577548f
28-09-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/10458 Date: 2022-09-27 21:38:43 +0000
27-09-2022

I will fix this this issue by using the -XX:+AlwaysPreTouch option, but I first need to wait for JDK-8293550, which causes failures due to the core file not being produced (I've been working around that issue locally by manually adding the com.apple.security.get-task-allow entitlement to the java binary).
14-09-2022

lldb also shows that the G1 heap is not readable. You get the following error message: (lldb) x/4xg 0x000000029fe40498 error: core file does not contain 0x000000029fe40498 Note that if you play around with the -Xmx size and set it very small (like -Xmx128m), then sometimes the heap is readable in the core file.
12-09-2022

This appears to be an issue with G1. Every sign pointed to not being able to read in the java heap. Although there wasn't anything like a page fault, the data always seemed corrupted. Howver, when I started to use ParallelGC, all the core file tests started to pass. I wonder if the G1 pages have been marked to not be dump to the core file on macosx-aarch64.
09-09-2022

The root cause is that the handle stored at JavaThread::_threadObj appears to not be valid. Here is the jstack output for a thread: WARNING: could not get Thread object: sun.jvm.hotspot.oops.UnknownOopException: 0x000000029fe71600 Could not get the java Thread object. Thread info will be limited. tid=0x000000012183b410 nid=40195 waiting for monitor entry [0x00000001719fe000] JavaThread state: _thread_blocked - jdk.test.lib.apps.LingeredApp.steadyState(java.lang.Object) @bci=8, line=536 (Interpreted frame) WARNING: could not get Thread object: sun.jvm.hotspot.oops.UnknownOopException: 0x000000029fe71600 - locked <0x000000029fe715f0> (a <unknown class>) - jdk.test.lib.apps.LingeredApp$1.run() @bci=4, line=542 (Interpreted frame) The tid is the JavaThread*, so we can use the inspect command on it: jhsdb> inspect 0x000000012183b410 Type is JavaThread (size of 1584) oop ThreadShadow::_pending_exception: null char* ThreadShadow::_exception_file: char @ null int ThreadShadow::_exception_line: 0 ThreadLocalAllocBuffer Thread::_tlab: ThreadLocalAllocBuffer @ 0x000000012183b510 jlong Thread::_allocated_bytes: 0 ResourceArea* Thread::_resource_area: ResourceArea @ 0x00006000027e7300 OopHandle JavaThread::_threadObj: OopHandle @ 0x000000012183b6f0 OopHandle JavaThread::_vthread: OopHandle @ 0x000000012183b6f8 OopHandle JavaThread::_jvmti_vthread: OopHandle @ 0x000000012183b700 OopHandle JavaThread::_extentLocalCache: OopHandle @ 0x000000012183b708 JavaFrameAnchor JavaThread::_anchor: JavaFrameAnchor @ 0x000000012183b718 oop JavaThread::_vm_result: null Metadata* JavaThread::_vm_result_2: Metadata @ null ObjectMonitor* JavaThread::_current_pending_monitor: ObjectMonitor @ 0x0000600003b90010 bool JavaThread::_current_pending_monitor_is_from_java: 1 ObjectMonitor* JavaThread::_current_waiting_monitor: ObjectMonitor @ null uint32_t JavaThread::_suspend_flags: 0 oop JavaThread::_exception_oop: null address JavaThread::_exception_pc: address @ 0x000000012183b898 int JavaThread::_is_method_handle_return: 0 address JavaThread::_saved_exception_pc: address @ 0x000000012183b7e8 JavaThreadState JavaThread::_thread_state: 10 OSThread* JavaThread::_osthread: OSThread @ 0x0000600002f9fa40 address JavaThread::_stack_base: address @ 0x000000012183b6a0 size_t JavaThread::_stack_size: 2109440 vframeArray* JavaThread::_vframe_array_head: vframeArray @ null vframeArray* JavaThread::_vframe_array_last: vframeArray @ null JNIHandleBlock* JavaThread::_active_handles: JNIHandleBlock @ 0x0000000111623df0 JavaThread::TerminatedTypes JavaThread::_terminated: 57002 Now lets look at the _threadObj field (btw, this debugging session with UseCompressedOops turned off, but the issue also happens with it on). 0x000000012183b6f0 is actually the address of the _threadObj field, not its contents, so displaying the address stored should reveal the address of the oop* of the Thread object hsdb> mem -v 0x000000012183b6f0 Address 0x000000012183b6f0: In unknown location 0x000000012183b6f0: 0x0000000121609e80 Ok, so 0x0000000121609e80 is the oop*. Now lets get the oop, which is the pointer to the Thread object: hsdb> mem -v 0x0000000121609e80 Address 0x0000000121609e80: In unknown location 0x0000000121609e80: 0x000000029fe71600 In unknown section of Java heap 0x000000029fe71600 is the oop, and it does point into the java heap, which is good (even for valid heap pointers SA usually doesn't know where in the java heap). Now if this really is a pointer to a Thread object, we should be able to use "inspect" on it to dump its contents: hsdb> inspect 0x0000000121609e80 Nope, no output, and dumping the memory clearly shows that it doesn't resemble a Thread object. Most of it is 0's. So it appears something is wrong with JavaThread::_threadObj. JDK-8293006, showed similar issues with JavaThread::_threadObj. But it was rare, on all platforms, and was due to the test attaching to an an active process (jshell starting up) multiple times. Once in a while the attach was untimely and the heap was in an inconsistent state. That not the case here. Every core dump I've looked at on macosx-aarch64 has this issue. We can never resolve JavaThread::_threadObj into a valid java.lang.Thread instance. What is also very odd is that this does not happen when the same tests attach to a live process rather than produce a core dump. Makes me wonder if there isn't some sort of permissions or mapping issue reading from the java heap.
09-09-2022