Bug ID: JDK-8251130 serviceability/sa/TestJmapCore.java fails with sun.jvm.hotspot.oops.UnknownOopException on linux-aarch64

JDK-8251130 : serviceability/sa/TestJmapCore.java fails with sun.jvm.hotspot.oops.UnknownOopException on linux-aarch64

Type: Bug
Component: hotspot
Sub-Component: svc-agent
Affected Version: 16

Priority: P4
Status: Closed
Resolution: Cannot Reproduce
OS: linux
CPU: aarch64

Submitted: 2020-08-05
Updated: 2023-08-17
Resolved: 2023-08-17

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

Other
tbdResolved

Related Reports

Relates :	JDK-8255661 - TestHeapDumpOnOutOfMemoryError fails with EOFException
Relates :	JDK-8244203 - sun/tools/jhsdb/HeapDumpTestWithActiveProcess.java fails with NullPointerException

Description

Seems to be a new failure mode:

----------System.err:(36/2448)----------
sun.jvm.hotspot.oops.UnknownOopException
	at jdk.hotspot.agent/sun.jvm.hotspot.oops.ObjectHeap.newOop(ObjectHeap.java:193)
	at jdk.hotspot.agent/sun.jvm.hotspot.oops.VMOopHandle.resolve(VMOopHandle.java:61)
	at jdk.hotspot.agent/sun.jvm.hotspot.oops.Klass.getJavaMirror(Klass.java:114)
	at jdk.hotspot.agent/sun.jvm.hotspot.utilities.HeapHprofBinWriter$4.visit(HeapHprofBinWriter.java:1120)
	at jdk.hotspot.agent/sun.jvm.hotspot.classfile.ClassLoaderData.classesDo(ClassLoaderData.java:114)
	at jdk.hotspot.agent/sun.jvm.hotspot.classfile.ClassLoaderDataGraph.classesDo(ClassLoaderDataGraph.java:84)
	at jdk.hotspot.agent/sun.jvm.hotspot.utilities.HeapHprofBinWriter.writeClasses(HeapHprofBinWriter.java:1117)
	at jdk.hotspot.agent/sun.jvm.hotspot.utilities.HeapHprofBinWriter.write(HeapHprofBinWriter.java:436)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.JMap.writeHeapHprofBin(JMap.java:182)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.JMap.run(JMap.java:97)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:262)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:225)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.JMap.main(JMap.java:176)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJMAP(SALauncher.java:331)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:483)

java.io.EOFException
	at java.base/java.io.DataInputStream.readInt(DataInputStream.java:396)
	at jdk.test.lib.hprof.parser.HprofReader.read(HprofReader.java:209)
	at jdk.test.lib.hprof.parser.Reader.readFile(Reader.java:92)
	at jdk.test.lib.hprof.HprofParser.parse(HprofParser.java:85)
	at jdk.test.lib.hprof.HprofParser.parse(HprofParser.java:54)
	at TestJmapCore.test(TestJmapCore.java:106)
	at TestJmapCore.main(TestJmapCore.java:70)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
	at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
	at java.base/java.lang.Thread.run(Thread.java:832)

JavaTest Message: Test threw exception: java.io.EOFException

Comments

This only reproduced twice, and that was nearly 3 years ago. Closing as CNR.
17-08-2023
SerialGC is used in tier3 GC testing, which is when this bug appeared: Run test open/test/hotspot/jtreg/:hotspot_serviceability with linux-aarch64-debug with -XX:+CreateCoredumpOnCrash -XX:+UseSerialGC #tier3-gc There have been plenty of tier3 runs, so I don't think this is an issue with SerialGC and SA simply not working together, although it doesn't mean there can't be an intermittent bug hiding there.
06-08-2020
So default GC is G1. I suggest trying some of the other SA core tests explicitly setting UseSerialGC.
06-08-2020
They don't specify which GC to use.
06-08-2020
I don't know how to interpret the stats. If we only have 349,568K of tenured space for large object allocation, and the allocation request is for 524,429k then I would expect an immediate OOM condition with no futile attempt to free up non-existent space. Do the other SA core tests also utilise SerialGC?
06-08-2020
But the hs_err stats for the heap make it look like there has been a full GC that clears out the younger generations. What would have triggered that? If you asking if SA core file processing works on Aarch64, the answer is yes. We run all the SA tests on Aarch64, including all 7 that process core files.
06-08-2020
If the allocation requested simply exceeds the configured heap memory size then GC will not have run so it seems unlikely to be an issue with SerialGC. Can we process corefiles on Aarch64 in other circumstances?
06-08-2020
This is the test heap allocating code: Object[] oa = new Object[Integer.MAX_VALUE / 2]; for(int i = 0; i < oa.length; i++) { oa[i] = new Object[Integer.MAX_VALUE / 2]; } According to the stack trace from the OOME fatal error: #10 0x0000fffcd2643ea8 in array_allocate (__the_thread__=0xfffccc030240, do_zero=true, length=1073741823, size=536870914, klass=<optimized out>, this=<optimized out>) So the size of the object is 524,429k and the tenured generation is only 349,568K, so I would assume the first allocation is failing, and probably the forced full GC due to being out of memory is why everything has been promoted to the tenured generation. So none of this seem unusual or wrong. Just not sure why the Thread instances don't have a valid Klass in the header.
05-08-2020
I suppose it's possible. Here's some heap info from the hs_err file: def new generation total 157248K, used 0K [0x00000000e0000000, 0x00000000eaaa0000, 0x00000000eaaa0000) eden space 139776K, 0% used [0x00000000e0000000, 0x00000000e0000000, 0x00000000e8880000) from space 17472K, 0% used [0x00000000e9990000, 0x00000000e9990000, 0x00000000eaaa0000) to space 17472K, 0% used [0x00000000e8880000, 0x00000000e8880000, 0x00000000e9990000) tenured generation total 349568K, used 521K [0x00000000eaaa0000, 0x0000000100000000, 0x0000000100000000) the space 349568K, 0% used [0x00000000eaaa0000, 0x00000000eab224d8, 0x00000000eab22600, 0x0000000100000000) Metaspace used 3690K, capacity 4490K, committed 4864K, reserved 65536K class space used 314K, capacity 386K, committed 512K, reserved 57344K So it looks like there is plenty of memory available. Keep in mind the issue is that we have JavaThreads (all JavaThreads) whose underlying java.lang.Thread instances have object headers that aren't referencing the Klass for java.lang.Thread. Instead the reference is to NULL. Does SerialGC change this to NULL when moving objects? If so it would appear that the JavaThreads are pointing to old Thread instances. BTW, the Thread object (0x00000000eaab60e8 ) is in the "tenured" generation. The other generations have all been completely cleared of objects. The bug as originally reported seems to have a similar issue: Unknown oop at 0x00000000eaaa0460 Oop's klass is null sun.jvm.hotspot.oops.UnknownOopException at jdk.hotspot.agent/sun.jvm.hotspot.oops.ObjectHeap.newOop(ObjectHeap.java:193) at jdk.hotspot.agent/sun.jvm.hotspot.oops.VMOopHandle.resolve(VMOopHandle.java:61) at jdk.hotspot.agent/sun.jvm.hotspot.oops.Klass.getJavaMirror(Klass.java:114) at jdk.hotspot.agent/sun.jvm.hotspot.utilities.HeapHprofBinWriter$4.visit(HeapHprofBinWriter.java:1120) at jdk.hotspot.agent/sun.jvm.hotspot.classfile.ClassLoaderData.classesDo(ClassLoaderData.java:114)
05-08-2020
The test is being run with -XX:+UseSerialGC so you won't see any GC threads. But in exhausting the Java heap GC must have run at some point - though synchronously with the thread trying to do the allocation. It may be possible that there is an issue with SerialGC on Aarch64?
05-08-2020
Based on the above, it seems like we must be in the middle of a GC, but a thread dump doesn't show that. In fact it shows no GC threads. Note thread #1 is the one that crashed to cause the core dump. This was intentionally done by exhausting the java heap. Thread 13 (LWP 23173): #0 0x0000fffcd389c31c in pthread_cond_timedwait@@GLIBC_2.17 () from /scratch/uadmin/root/lib64/libpthread.so.0 #1 0x0000fffcd2bed650 in os::PlatformMonitor::wait (this=this@entry=0xfffccc0297d0, millis=<optimized out>, millis@entry=86400000) at os_posix.hpp:320 #2 0x0000fffcd2b44ab4 in Monitor::wait_without_safepoint_check (this=0xfffccc0297c0, timeout=timeout@entry=86400000) at mutex.cpp:203 #3 0x0000fffcd2e1b5dc in wait (timeout=86400000, as_suspend_equivalent=false, this=0xfffcb562e890, this=0xfffcb562e890) at mutexLocker.hpp:259 #4 NMethodSweeper::sweeper_loop () at sweeper.cpp:222 #5 0x0000fffcd2e88ab8 in JavaThread::thread_main_inner (this=0xfffccc2dd870) at thread.hpp:1292 #6 0x0000fffcd2e902a8 in Thread::call_run (this=this@entry=0xfffccc2dd870) at thread.cpp:393 #7 0x0000fffcd2bde2a0 in thread_native_entry (thread=0xfffccc2dd870) at os_linux.cpp:790 #8 0x0000fffcd3897d40 in start_thread () from /scratch/uadmin/root/lib64/libpthread.so.0 #9 0x0000fffcd37b2c60 in __lseek_nocancel () from /scratch/uadmin/root/lib64/libc.so.6 Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 12 (LWP 23168): #0 0x0000fffcd389bff8 in pthread_cond_wait@@GLIBC_2.17 () from /scratch/uadmin/root/lib64/libpthread.so.0 #1 0x0000fffcd2bec300 in os::PlatformEvent::park (this=0xfffccc277a00) at os_posix.cpp:1934 #2 0x0000fffcd2b96f20 in ObjectMonitor::wait (this=this@entry=0xfffca8004d80, millis=millis@entry=0, interruptible=interruptible@entry=true, __the_thread__=__the_thread__@entry=0xfffccc276a00) at objectMonitor.cpp:1413 #3 0x0000fffcd2e37e48 in ObjectSynchronizer::wait (obj=..., obj@entry=..., millis=millis@entry=0, __the_thread__=__the_thread__@entry=0xfffccc276a00) at synchronizer.cpp:783 #4 0x0000fffcd27feb50 in JVM_MonitorWait (env=<optimized out>, handle=<optimized out>, ms=0) at jvm.cpp:681 #5 0x0000fffcb5cb0bac in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 11 (LWP 23174): #0 0x0000fffcd389bff8 in pthread_cond_wait@@GLIBC_2.17 () from /scratch/uadmin/root/lib64/libpthread.so.0 #1 0x0000fffcd2bed6c8 in os::PlatformMonitor::wait (this=this@entry=0xfffccc02a050, millis=millis@entry=0) at os_posix.hpp:320 #2 0x0000fffcd2b44ab4 in Monitor::wait_without_safepoint_check (this=0xfffccc02a040, timeout=timeout@entry=0) at mutex.cpp:203 #3 0x0000fffcd2b7a1b0 in wait (timeout=0, as_suspend_equivalent=true, this=0xfffcb542e890, this=0xfffcb542e890) at mutexLocker.hpp:256 #4 NotificationThread::notification_thread_entry (jt=0xfffccc3d5130, __the_thread__=0xfffccc3d5130) at notificationThread.cpp:110 #5 0x0000fffcd2e88ab8 in JavaThread::thread_main_inner (this=0xfffccc3d5130) at thread.hpp:1292 #6 0x0000fffcd2e902a8 in Thread::call_run (this=this@entry=0xfffccc3d5130) at thread.cpp:393 #7 0x0000fffcd2bde2a0 in thread_native_entry (thread=0xfffccc3d5130) at os_linux.cpp:790 #8 0x0000fffcd3897d40 in start_thread () from /scratch/uadmin/root/lib64/libpthread.so.0 #9 0x0000fffcd37b2c60 in __lseek_nocancel () from /scratch/uadmin/root/lib64/libc.so.6 Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 10 (LWP 23175): #0 0x0000fffcd389fbc4 in nanosleep () from /scratch/uadmin/root/lib64/libpthread.so.0 #1 0x0000fffcd2bea1d0 in naked_short_nanosleep (ns=999000000) at os_posix.cpp:695 #2 os::naked_short_sleep (ms=ms@entry=999) at os_posix.cpp:701 #3 0x0000fffcd2e84da8 in WatcherThread::run (this=<optimized out>) at thread.cpp:1483 #4 0x0000fffcd2e902a8 in Thread::call_run (this=this@entry=0xfffccc3da600) at thread.cpp:393 #5 0x0000fffcd2bde2a0 in thread_native_entry (thread=0xfffccc3da600) at os_linux.cpp:790 #6 0x0000fffcd3897d40 in start_thread () from /scratch/uadmin/root/lib64/libpthread.so.0 #7 0x0000fffcd37b2c60 in __lseek_nocancel () from /scratch/uadmin/root/lib64/libc.so.6 Backtrace stopped: previous frame identical to this frame (corrupt stack?) ---Type <return> to continue, or q <return> to quit--- Thread 9 (LWP 23171): #0 0x0000fffcd389c31c in pthread_cond_timedwait@@GLIBC_2.17 () from /scratch/uadmin/root/lib64/libpthread.so.0 #1 0x0000fffcd2bed650 in os::PlatformMonitor::wait (this=0xfffccc02cf10, millis=<optimized out>, millis@entry=5000) at os_posix.hpp:320 #2 0x0000fffcd2b45558 in Monitor::wait (this=0xfffccc02cf00, timeout=timeout@entry=5000, as_suspend_equivalent=as_suspend_equivalent@entry=false) at mutex.cpp:238 #3 0x0000fffcd22b3900 in wait (timeout=5000, as_suspend_equivalent=false, this=0xfffcb5a2e720, this=0xfffcb5a2e720) at mutexLocker.hpp:257 #4 CompileQueue::get (this=this@entry=0xfffccc2cad70) at compileBroker.cpp:447 #5 0x0000fffcd22b831c in CompileBroker::compiler_thread_loop () at compileBroker.cpp:1870 #6 0x0000fffcd2e88ab8 in JavaThread::thread_main_inner (this=0xfffccc2caf90) at thread.hpp:1292 #7 0x0000fffcd2e902a8 in Thread::call_run (this=this@entry=0xfffccc2caf90) at thread.cpp:393 #8 0x0000fffcd2bde2a0 in thread_native_entry (thread=0xfffccc2caf90) at os_linux.cpp:790 #9 0x0000fffcd3897d40 in start_thread () from /scratch/uadmin/root/lib64/libpthread.so.0 #10 0x0000fffcd37b2c60 in __lseek_nocancel () from /scratch/uadmin/root/lib64/libc.so.6 Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 8 (LWP 23176): #0 0x0000fffcd389c31c in pthread_cond_timedwait@@GLIBC_2.17 () from /scratch/uadmin/root/lib64/libpthread.so.0 #1 0x0000fffcd2bec624 in os::PlatformEvent::park (this=0xfffccc3ef700, millis=millis@entry=60000) at os_posix.cpp:1982 #2 0x0000fffcd2b96864 in ObjectMonitor::wait (this=this@entry=0xfffca8006e80, millis=millis@entry=60000, interruptible=interruptible@entry=true, __the_thread__=__the_thread__@entry=0xfffccc3eecf0) at objectMonitor.cpp:1415 #3 0x0000fffcd2e37e48 in ObjectSynchronizer::wait (obj=..., obj@entry=..., millis=millis@entry=60000, __the_thread__=__the_thread__@entry=0xfffccc3eecf0) at synchronizer.cpp:783 #4 0x0000fffcd27feb50 in JVM_MonitorWait (env=<optimized out>, handle=<optimized out>, ms=60000) at jvm.cpp:681 #5 0x0000fffcb5cb0bac in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 7 (LWP 23170): #0 0x0000fffcd389bff8 in pthread_cond_wait@@GLIBC_2.17 () from /scratch/uadmin/root/lib64/libpthread.so.0 #1 0x0000fffcd2bed6c8 in os::PlatformMonitor::wait (this=0xfffccc02b8c0, millis=millis@entry=0) at os_posix.hpp:320 #2 0x0000fffcd2b45558 in Monitor::wait (this=0xfffccc02b8b0, timeout=timeout@entry=0, as_suspend_equivalent=as_suspend_equivalent@entry=false) at mutex.cpp:238 #3 0x0000fffcd2f7e7d8 in wait (timeout=0, as_suspend_equivalent=false, this=0xfffcb5c2e520, this=0xfffcb5c2e520) at mutexLocker.hpp:257 #4 VMThread::execute (op=op@entry=0xfffcb5c2e6a0) at vmThread.cpp:580 #5 0x0000fffcd259cb54 in Handshake::execute (thread_cl=thread_cl@entry=0xfffcb5c2e730) at handshake.cpp:333 #6 0x0000fffcd2e30b54 in ObjectSynchronizer::deflate_idle_monitors_using_JT () at synchronizer.cpp:2371 #7 0x0000fffcd2d36cdc in ServiceThread::service_thread_entry (jt=0xfffccc2c8c00, __the_thread__=0xfffccc2c8c00) at serviceThread.cpp:201 #8 0x0000fffcd2e88ab8 in JavaThread::thread_main_inner (this=0xfffccc2c8c00) at thread.hpp:1292 #9 0x0000fffcd2e902a8 in Thread::call_run (this=this@entry=0xfffccc2c8c00) at thread.cpp:393 #10 0x0000fffcd2bde2a0 in thread_native_entry (thread=0xfffccc2c8c00) at os_linux.cpp:790 #11 0x0000fffcd3897d40 in start_thread () from /scratch/uadmin/root/lib64/libpthread.so.0 #12 0x0000fffcd37b2c60 in __lseek_nocancel () from /scratch/uadmin/root/lib64/libc.so.6 Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 6 (LWP 23172): #0 0x0000fffcd389c31c in pthread_cond_timedwait@@GLIBC_2.17 () from /scratch/uadmin/root/lib64/libpthread.so.0 #1 0x0000fffcd2bed650 in os::PlatformMonitor::wait (this=0xfffccc02cf10, millis=<optimized out>, millis@entry=5000) at os_posix.hpp:320 #2 0x0000fffcd2b45558 in Monitor::wait (this=0xfffccc02cf00, timeout=timeout@entry=5000, as_suspend_equivalent=as_suspend_equivalent@entry=false) at mutex.cpp:238 #3 0x0000fffcd22b3900 in wait (timeout=5000, as_suspend_equivalent=false, this=0xfffcb582e720, this=0xfffcb582e720) at mutexLocker.hpp:257 #4 CompileQueue::get (this=this@entry=0xfffccc2cae30) at compileBroker.cpp:447 ---Type <return> to continue, or q <return> to quit--- #5 0x0000fffcd22b831c in CompileBroker::compiler_thread_loop () at compileBroker.cpp:1870 #6 0x0000fffcd2e88ab8 in JavaThread::thread_main_inner (this=0xfffccc2ccdc0) at thread.hpp:1292 #7 0x0000fffcd2e902a8 in Thread::call_run (this=this@entry=0xfffccc2ccdc0) at thread.cpp:393 #8 0x0000fffcd2bde2a0 in thread_native_entry (thread=0xfffccc2ccdc0) at os_linux.cpp:790 #9 0x0000fffcd3897d40 in start_thread () from /scratch/uadmin/root/lib64/libpthread.so.0 #10 0x0000fffcd37b2c60 in __lseek_nocancel () from /scratch/uadmin/root/lib64/libc.so.6 Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 5 (LWP 23169): #0 0x0000fffcd389e214 in do_futex_wait.constprop.1 () from /scratch/uadmin/root/lib64/libpthread.so.0 #1 0x0000fffcd389e2b4 in __new_sem_wait_slow.constprop.0 () from /scratch/uadmin/root/lib64/libpthread.so.0 #2 0x0000fffcd389e390 in sem_trywait@@GLIBC_2.17 () from /scratch/uadmin/root/lib64/libpthread.so.0 Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 4 (LWP 23166): #0 0x0000fffcd389fbc4 in nanosleep () from /scratch/uadmin/root/lib64/libpthread.so.0 #1 0x0000fffcd2bea1d0 in naked_short_nanosleep (ns=1000000) at os_posix.cpp:695 #2 os::naked_short_sleep (ms=ms@entry=1) at os_posix.cpp:701 #3 0x0000fffcd259f498 in wait_raw (this=0xfffcd0a0e390, now=<optimized out>) at handshake.cpp:91 #4 process (this=0xfffcd0a0e390) at handshake.cpp:146 #5 VM_HandshakeAllThreads::doit (this=0xfffcb5c2e6a0) at handshake.cpp:287 #6 0x0000fffcd2f4d698 in VM_Operation::evaluate (this=this@entry=0xfffcb5c2e6a0) at vmOperations.cpp:68 #7 0x0000fffcd2f7ea30 in VMThread::evaluate_operation (this=this@entry=0xfffccc229fc0, op=0xfffcb5c2e6a0) at vmThread.cpp:358 #8 0x0000fffcd2f7f960 in VMThread::loop (this=0xfffccc229fc0) at vmThread.cpp:506 #9 0x0000fffcd2f7fa9c in VMThread::run (this=<optimized out>) at vmThread.cpp:250 #10 0x0000fffcd2e902a8 in Thread::call_run (this=this@entry=0xfffccc229fc0) at thread.cpp:393 #11 0x0000fffcd2bde2a0 in thread_native_entry (thread=0xfffccc229fc0) at os_linux.cpp:790 #12 0x0000fffcd3897d40 in start_thread () from /scratch/uadmin/root/lib64/libpthread.so.0 #13 0x0000fffcd37b2c60 in __lseek_nocancel () from /scratch/uadmin/root/lib64/libc.so.6 Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 3 (LWP 23167): #0 0x0000fffcd389bff8 in pthread_cond_wait@@GLIBC_2.17 () from /scratch/uadmin/root/lib64/libpthread.so.0 #1 0x0000fffcd2bed6c8 in os::PlatformMonitor::wait (this=0xfffccc02c140, millis=millis@entry=0) at os_posix.hpp:320 #2 0x0000fffcd2b45558 in Monitor::wait (this=0xfffccc02c130, timeout=timeout@entry=0, as_suspend_equivalent=as_suspend_equivalent@entry=false) at mutex.cpp:238 #3 0x0000fffcd281412c in wait (timeout=0, as_suspend_equivalent=false, this=0xfffcd07ee280, this=0xfffcd07ee280) at mutexLocker.hpp:256 #4 JVM_WaitForReferencePendingList (env=<optimized out>) at jvm.cpp:3450 #5 0x0000fffcb5cb0bac in ?? () #6 0x00000000eaab3af8 in ?? () Thread 2 (LWP 23163): #0 0x0000fffcd3898cf4 in pthread_join () from /scratch/uadmin/root/lib64/libpthread.so.0 #1 0x0000fffcd38d7cfc in CallJavaMainInNewThread (stack_size=2097152, args=args@entry=0xffffcf42e370) at java_md.c:664 #2 0x0000fffcd38d5344 in ContinueInNewThread (ifn=ifn@entry=0xffffcf42e480, threadStackSize=<optimized out>, argc=1, argv=0xaaab4fd60ff0, mode=mode@entry=1, what=what@entry=0xaaab4fd60b00 "TestJmapCore", ret=ret@entry=0) at java.c:2357 #3 0x0000fffcd38d7db4 in JVMInit (ifn=ifn@entry=0xffffcf42e480, threadStackSize=<optimized out>, argc=<optimized out>, argv=<optimized out>, mode=mode@entry=1, what=what@entry=0xaaab4fd60b00 "TestJmapCore", ret=ret@entry=0) at java_md.c:689 #4 0x0000fffcd38d5b3c in JLI_Launch (argc=<optimized out>, argv=<optimized out>, jargc=<optimized out>, jargv=<optimized out>, appclassc=0, appclassv=0x0, ---Type <return> to continue, or q <return> to quit--- fullversion=<optimized out>, dotversion=<optimized out>, pname=<optimized out>, lname=<optimized out>, javaargs=<optimized out>, cpwildcard=<optimized out>, javaw=<optimized out>, ergo=<optimized out>) at java.c:343 #5 0x0000aaab45540b50 in main () Thread 1 (LWP 23165): #0 0x0000fffcd3705340 in raise () from /scratch/uadmin/root/lib64/libc.so.6 #1 0x0000fffcd37069a4 in abort () from /scratch/uadmin/root/lib64/libc.so.6 #2 0x0000fffcd2bd4868 in os::abort (dump_core=true, siginfo=<optimized out>, context=<optimized out>) at os_linux.cpp:1540 #3 0x0000fffcd2f4b994 in VMError::report_and_die (id=id@entry=-536870912, message=message@entry=0xfffcd30d6168 "fatal error", detail_fmt=detail_fmt@entry=0xfffcd30d6350 "OutOfMemory encountered: %s", detail_args=..., thread=0xfffccc030240, pc=pc@entry=0x0, siginfo=siginfo@entry=0x0, context=0xfffcd361be60 <g_stored_assertion_context>, filename=filename@entry=0xfffcd30d61a0, lineno=lineno@entry=339, size=size@entry=0) at vmError.cpp:1635 #4 0x0000fffcd2f4c38c in VMError::report_and_die (thread=<optimized out>, context=<optimized out>, filename=filename@entry=0xfffcd30d61a0, lineno=lineno@entry=339, message=message@entry=0xfffcd30d6168 "fatal error", detail_fmt=detail_fmt@entry=0xfffcd30d6350 "OutOfMemory encountered: %s", detail_args=<error reading variable: Cannot access memory at address 0x2>) at vmError.cpp:1340 #5 0x0000fffcd2307b64 in report_fatal ( file=file@entry=0xfffcd30d61a0, line=line@entry=339, detail_fmt=detail_fmt@entry=0xfffcd30d6350 "OutOfMemory encountered: %s") at thread.hpp:858 #6 0x0000fffcd2307df4 in report_java_out_of_memory (message=message@entry=0xfffcd31a1630 "Java heap space") at debug.cpp:339 #7 0x0000fffcd2a8b114 in MemAllocator::Allocation::check_out_of_memory (this=this@entry=0xfffcd19ee148) at memAllocator.cpp:126 #8 0x0000fffcd2a8d454 in ~Allocation (this=0xfffcd19ee148, __in_chrg=<optimized out>) at memAllocator.cpp:83 #9 MemAllocator::allocate (this=this@entry=0xfffcd19ee1f8) at memAllocator.cpp:362 #10 0x0000fffcd2643ea8 in array_allocate (__the_thread__=0xfffccc030240, do_zero=true, length=1073741823, size=536870914, klass=<optimized out>, this=<optimized out>) at collectedHeap.inline.hpp:77 #11 InstanceKlass::allocate_objArray (this=this@entry=0x100001080, n=n@entry=1, length=length@entry=1073741823, __the_thread__=__the_thread__@entry=0xfffccc030240) at instanceKlass.cpp:1389 #12 0x0000fffcd2bb81e8 in oopFactory::new_objArray (klass=klass@entry=0x100001080, length=length@entry=1073741823, __the_thread__=__the_thread__@entry=0xfffccc030240) at instanceKlass.hpp:1074 #13 0x0000fffcd267ddfc in InterpreterRuntime::anewarray (thread=0xfffccc030240, pool=<optimized out>, index=<optimized out>, size=1073741823) at interpreterRuntime.cpp:251 #14 0x0000fffcb5cbf99c in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?)
05-08-2020
I grabbed the core file and tried running SA tools against it. There are a lot of issues. I think the jvm is in a state where SA just can't reliably debug it. I confirmed the UnknownOopException when dumping the stack (I used the clhsdb "dumpstack" command instead of pmap). Any command having to do with threads such as the clhsdb "threads", "jstack", and "pstack" commands will generate the following exception repeatedly. The top 3 frames are the same for each of these commands: sun.jvm.hotspot.oops.UnknownOopException at jdk.hotspot.agent/sun.jvm.hotspot.oops.ObjectHeap.newOop(ObjectHeap.java:193) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.JavaThread.getThreadObj(JavaThread.java:350) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.JavaThread.printThreadInfoOn(JavaThread.java:473) at jdk.hotspot.agent/sun.jvm.hotspot.tools.PStack.run(PStack.java:94) at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor$29.doit(CommandProcessor.java:1129) at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.executeCommand(CommandProcessor.java:2051) at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.executeCommand(CommandProcessor.java:2021) at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.run(CommandProcessor.java:1901) at jdk.hotspot.agent/sun.jvm.hotspot.CLHSDB.run(CLHSDB.java:99) at jdk.hotspot.agent/sun.jvm.hotspot.CLHSDB.main(CLHSDB.java:40) at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runCLHSDB(SALauncher.java:280) at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:483) I turned on the following debugging in ObjectHeap.newOop(): if (DEBUG) { System.err.println("Unknown oop at " + handle); System.err.println("Oop's klass is " + klass); } And the output: Unknown oop at 0x00000000eaab60e8 Oop's klass is null So the oop being fetched from the JavaThread::_threadObj field has a null klass object, and this same exception happens for every thread when using commands like "threads" or "jstack".
05-08-2020
This seems a lot like JDK-8244203, which was fixed. The stack trace is similar, with the key being the ClassLoaderData.classesDo() frame. I wonder if we are in the process of unloading classes, or the heap is in an inconsistent state for some other reason. It looks like the core file that was generated is still available. I'll try to attach to it an get some native and java stack traces. That might shed some light on the state of the JVM when the attach was done. Note the first exception is from SA. The 2nd exception is from the test, presumably because the first exception resulted in a bad hprof file.
05-08-2020