JDK-8263901 : RunThese30M.java failed with SIGSEGV in Threads::owning_thread_from_monitor_owner
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 17
  • Priority: P3
  • Status: Resolved
  • Resolution: Cannot Reproduce
  • OS: linux
  • CPU: x86_64
  • Submitted: 2021-03-20
  • Updated: 2021-06-15
  • Resolved: 2021-04-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17
17Resolved
Related Reports
Relates :  
Relates :  
Description
The following test failed in the JDK17 CI:

applications/runthese/RunThese30M.java

Here's the crashing thread's stack:

---------------  T H R E A D  ---------------

Current thread (0x00007fdc80246340):  JavaThread "javasoft.sqe.tests.api.java.lang.management.ThreadInfo.ThreadInfo_addTests " daemon [_thread_in_vm, id=11510, stack(0x00007fdc42cfd000,0x00007fdc42dfe000)] _threads_hazard_ptr=0x00007fdc9c045e60

Stack: [0x00007fdc42cfd000,0x00007fdc42dfe000],  sp=0x00007fdc42dfb3a0,  free space=1016k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xd5dfa1]  Threads::owning_thread_from_monitor_owner(ThreadsList*, unsigned char*)+0xd1
V  [libjvm.so+0xd69d35]  ThreadSnapshot::initialize(ThreadsList*, JavaThread*)+0x255
V  [libjvm.so+0xd69f69]  ThreadDumpResult::add_thread_snapshot(JavaThread*)+0x69
V  [libjvm.so+0xb02415]  jmm_GetThreadInfo+0x3d5
j  sun.management.ThreadImpl.getThreadInfo1([JI[Ljava/lang/management/ThreadInfo;)V+0 java.management@17-ea
j  sun.management.ThreadImpl.getThreadInfo([JI)[Ljava/lang/management/ThreadInfo;+60 java.management@17-ea
j  sun.management.ThreadImpl.getThreadInfo(J)Ljava/lang/management/ThreadInfo;+11 java.management@17-ea
j  javasoft.sqe.tests.api.java.lang.management.ThreadInfo.ThreadInfo_addTests.ThreadInfo0030()Ljavasoft/sqe/javatest/Status;+58
v  ~StubRoutines::call_stub
V  [libjvm.so+0x7d58a5]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x2a5
V  [libjvm.so+0xc42ed9]  invoke(InstanceKlass*, methodHandle const&, Handle, bool, objArrayHandle, BasicType, objArrayHandle, bool, Thread*) [clone .constprop.0]+0x479
V  [libjvm.so+0xc43d16]  Reflection::invoke_method(oopDesc*, Handle, objArrayHandle, Thread*)+0x106
V  [libjvm.so+0x8a18fa]  JVM_InvokeMethod+0x12a
J 2288  jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@17-ea (0 bytes) @ 0x00007fdf493e56d4 [0x00007fdf493e55c0+0x0000000000000114]
J 22817 c1 jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@17-ea (150 bytes) @ 0x00007fdf4385dc74 [0x00007fdf4385cfe0+0x0000000000000c94]
J 15380 c2 java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@17-ea (65 bytes) @ 0x00007fdf49464cc4 [0x00007fdf49464c00+0x00000000000000c4]
j  javasoft.sqe.javatest.lib.MultiTest.invokeTestCase(Ljava/lang/reflect/Method;)Ljavasoft/sqe/javatest/Status;+8
j  javasoft.sqe.javatest.lib.MultiTest.run([Ljava/lang/String;Ljava/io/PrintWriter;Ljava/io/PrintWriter;)Ljavasoft/sqe/javatest/Status;+139
j  javasoft.sqe.javatest.lib.MultiTest.run([Ljava/lang/String;Ljava/io/PrintStream;Ljava/io/PrintStream;)Ljavasoft/sqe/javatest/Status;+40
j  javasoft.sqe.tests.api.java.lang.management.ThreadInfo.ThreadInfo_addTests.main([Ljava/lang/String;)V+16
v  ~StubRoutines::call_stub
V  [libjvm.so+0x7d58a5]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x2a5
V  [libjvm.so+0xc42ed9]  invoke(InstanceKlass*, methodHandle const&, Handle, bool, objArrayHandle, BasicType, objArrayHandle, bool, Thread*) [clone .constprop.0]+0x479
V  [libjvm.so+0xc43d16]  Reflection::invoke_method(oopDesc*, Handle, objArrayHandle, Thread*)+0x106
V  [libjvm.so+0x8a18fa]  JVM_InvokeMethod+0x12a
J 2288  jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@17-ea (0 bytes) @ 0x00007fdf493e56d4 [0x00007fdf493e55c0+0x0000000000000114]
J 22817 c1 jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@17-ea (150 bytes) @ 0x00007fdf4385dc74 [0x00007fdf4385cfe0+0x0000000000000c94]
J 6989 c2 jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@17-ea (10 bytes) @ 0x00007fdf4954fe8c [0x00007fdf4954fe20+0x000000000000006c]
J 7912 c1 applications.kitchensink.process.stress.modules.JckStressModule$TestRunner$1.run()V (127 bytes) @ 0x00007fdf42877c8c [0x00007fdf42877840+0x000000000000044c]
v  ~StubRoutines::call_stub
V  [libjvm.so+0x7d58a5]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x2a5
V  [libjvm.so+0x7d711b]  JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, Thread*)+0x1cb
V  [libjvm.so+0x894870]  thread_entry(JavaThread*, Thread*)+0x70
V  [libjvm.so+0xd5bb70]  JavaThread::thread_main_inner()+0xd0
V  [libjvm.so+0xd5f0ee]  Thread::call_run()+0xde
V  [libjvm.so+0xbbad27]  thread_native_entry(Thread*)+0xe7

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  sun.management.ThreadImpl.getThreadInfo1([JI[Ljava/lang/management/ThreadInfo;)V+0 java.management@17-ea
j  sun.management.ThreadImpl.getThreadInfo([JI)[Ljava/lang/management/ThreadInfo;+60 java.management@17-ea
j  sun.management.ThreadImpl.getThreadInfo(J)Ljava/lang/management/ThreadInfo;+11 java.management@17-ea
j  javasoft.sqe.tests.api.java.lang.management.ThreadInfo.ThreadInfo_addTests.ThreadInfo0030()Ljavasoft/sqe/javatest/Status;+58
v  ~StubRoutines::call_stub
J 2288  jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@17-ea (0 bytes) @ 0x00007fdf493e565b [0x00007fdf493e55c0+0x000000000000009b]
J 22817 c1 jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@17-ea (150 bytes) @ 0x00007fdf4385dc74 [0x00007fdf4385cfe0+0x0000000000000c94]
J 15380 c2 java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@17-ea (65 bytes) @ 0x00007fdf49464cc4 [0x00007fdf49464c00+0x00000000000000c4]
j  javasoft.sqe.javatest.lib.MultiTest.invokeTestCase(Ljava/lang/reflect/Method;)Ljavasoft/sqe/javatest/Status;+8
j  javasoft.sqe.javatest.lib.MultiTest.run([Ljava/lang/String;Ljava/io/PrintWriter;Ljava/io/PrintWriter;)Ljavasoft/sqe/javatest/Status;+139
j  javasoft.sqe.javatest.lib.MultiTest.run([Ljava/lang/String;Ljava/io/PrintStream;Ljava/io/PrintStream;)Ljavasoft/sqe/javatest/Status;+40
j  javasoft.sqe.tests.api.java.lang.management.ThreadInfo.ThreadInfo_addTests.main([Ljava/lang/String;)V+16
v  ~StubRoutines::call_stub
J 2288  jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@17-ea (0 bytes) @ 0x00007fdf493e565b [0x00007fdf493e55c0+0x000000000000009b]
J 22817 c1 jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@17-ea (150 bytes) @ 0x00007fdf4385dc74 [0x00007fdf4385cfe0+0x0000000000000c94]
J 6989 c2 jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@17-ea (10 bytes) @ 0x00007fdf4954fe8c [0x00007fdf4954fe20+0x000000000000006c]
J 7912 c1 applications.kitchensink.process.stress.modules.JckStressModule$TestRunner$1.run()V (127 bytes) @ 0x00007fdf42877c8c [0x00007fdf42877840+0x000000000000044c]
v  ~StubRoutines::call_stub

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000038
Comments
Closing as "Cannot Reproduce". Will reopen if the failure mode is spotted again.
12-04-2021

I setup a repo baselined at jdk-16+16 with the following additional fix: JDK-8264123 add ThreadsList.is_valid() support which is the fix that more easily catches the dangling TLH bug. Please note that jdk-16+16 DOES NOT include the fix for: JDK-8264393 JDK-8258284 introduced dangling TLH race which is the fix that I think solved this bug (JDK-8263901). I did a test run this week on my Dell T7600 with 25 runs of applications/runthese/RunThese30M.java in {fastdebug, release, slowdebug} configs for 75 runs in total. I used the JVM args from this bug's one test failure: "-XX:+CreateCoredumpOnCrash -XX:+UseZGC" I didn't see a single failure in the more than 48 hours of testing. I ran applications/runthese/RunThese30M.java in parallel with my jdk-17+17 stress testing run so there was no shortage of stress on RunThese30M.
12-04-2021

I tried to download the core file and poke around in gdb, but the core file is truncated according to gdb: $ gdb -f jdk-17/bin/java -c core.3851 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /scratch/dcubed/8263901/jdk-17/bin/java...Reading symbols from /scratch/dcubed/8263901/jdk-17/bin/java.debuginfo...done. done. BFD: Warning: /scratch/dcubed/8263901/core.3851 is truncated: expected core file size >= 35336388608, found: 2247229440. [New LWP 11510] <snip> [New LWP 11472] Failed to read a valid object file image from memory. Core was generated by `/opt/mach5/mesos/work_dir/jib-master/install/jdk-17+14-1087/linux-x64.jdk/jdk-1'. Program terminated with signal 6, Aborted. #0 0x00007fdf60628387 in ?? ()
08-04-2021

I changed my mind about adding the new test as: test/hotspot/jtreg/vmTestbase/nsk/monitoring/ThreadInfo/getLockOwnerName/getlockownername002/TestDescription.java test/hotspot/jtreg/vmTestbase/nsk/monitoring/ThreadInfo/getLockOwnerName/getlockownername002.java Instead, I'm adding the new test over here: test/hotspot/jtreg/serviceability/monitoring/ThreadInfo/getLockOwnerName/getLockOwnerName.java test/hotspot/jtreg/serviceability/monitoring/ThreadInfo/getLockOwnerName/libgetLockOwnerName.cpp to model it after some recently ported and updated tests since I need the wait4ContendedEnter() function in order to properly test ThreadInfo.getLockOwnerName().
07-04-2021

Creating a new test to stress test the code path from the hs_err_pid file: test/hotspot/jtreg/vmTestbase/nsk/monitoring/ThreadInfo/getLockOwnerName/getlockownername002/TestDescription.java test/hotspot/jtreg/vmTestbase/nsk/monitoring/ThreadInfo/getLockOwnerName/getlockownername002.java Verified that we're reaching the right place with a bit of debug code: $ git diff diff --git a/src/hotspot/share/runtime/thread.cpp b/src/hotspot/share/runtime/thread.cpp index 2c453cda887..e471b141d70 100644 --- a/src/hotspot/share/runtime/thread.cpp +++ b/src/hotspot/share/runtime/thread.cpp @@ -4010,6 +4010,7 @@ JavaThread *Threads::owning_thread_from_monitor_owner(ThreadsList * t_list, // NULL owner means not locked so we can skip the search if (owner == NULL) return NULL; +if (UseNewCode) fatal("XXX - in owning_thread_from_monitor_owner()"); DO_JAVA_THREADS(t_list, p) { // first, see if owner is the address of a Java thread if (owner == (address)p) return p; The resulting crash with the new test has the same call stack in the VM as the failure's hs_err_pid.
06-04-2021

Starting to look at the code paths from the hs_err_pid and remind myself what this code is doing any why... the M&M code's use of Thread-SMR is not the most straight forward bit of implementation that we did...
06-04-2021

Oh, Ok, it's yours then!
24-03-2021

I think this is related to the dangling ThreadsList bug that I'm currently hunting down.
24-03-2021

I was thinking of the change I made to make threadObj to OopHandle.
24-03-2021

Which changes [~mseledtsov]?
24-03-2021

This could be crashing because of Runtime changes that point from THREAD to threadObj
23-03-2021

ILW = HLM = P3
23-03-2021

This crash happened in the HotSpot code that implements the M&M GetThreadInfo() API so I'm starting this bug off in hotspot/svc.
20-03-2021