JDK-8201592 : Fatal error due to illegal safepoint state when compiler thread is writing hs_err-file
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 11
  • Priority: P3
  • Status: Closed
  • Resolution: Cannot Reproduce
  • Submitted: 2018-04-16
  • Updated: 2019-05-28
  • Resolved: 2018-10-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 12
12Resolved
Related Reports
Relates :  
Relates :  
Description
#0  0x00007ffff71f41c7 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007ffff71f5e2a in __GI_abort () at abort.c:89
#2  0x00007ffff5fd636d in os::abort (dump_core=true, siginfo=0x7fffbd401470, context=0x7fffbd401340) at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/os/linux/os_linux.cpp:1413
#3  0x00007ffff62311cc in VMError::report_and_die (id=-536870912, message=0x7ffff64c5a59 "fatal error", detail_fmt=0x7ffff68d33ec "LEAF method calling lock?", detail_args=0x7fffbd400ae8, 
    thread=0x7ffff01b68e0, pc=0x0, siginfo=0x0, context=0x0, filename=0x7ffff68d2be0 "/home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/runtime/thread.cpp", lineno=966, size=0)
    at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/utilities/vmError.cpp:1505
#4  0x00007ffff623068a in VMError::report_and_die (thread=0x7ffff01b68e0, filename=0x7ffff68d2be0 "/home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/runtime/thread.cpp", lineno=966, 
    message=0x7ffff64c5a59 "fatal error", detail_fmt=0x7ffff68d33ec "LEAF method calling lock?", detail_args=0x7fffbd400ae8)
    at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/utilities/vmError.cpp:1245
#5  0x00007ffff58abe85 in report_fatal (file=0x7ffff68d2be0 "/home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/runtime/thread.cpp", line=966, detail_fmt=0x7ffff68d33ec "LEAF method calling lock?")
    at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/utilities/debug.cpp:229
#6  0x00007ffff61afa8d in Thread::check_for_valid_safepoint_state (this=0x7ffff01b68e0, potential_vm_operation=false) at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/runtime/thread.cpp:966
#7  0x00007ffff61c0f45 in ThreadsSMRSupport::acquire_stable_list (self=0x7ffff01b68e0, is_ThreadsListSetter=false) at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/runtime/threadSMR.cpp:558
#8  0x00007ffff61c0ba1 in ThreadsListHandle::ThreadsListHandle (this=0x7fffbd400c78, self=0x7ffff01b68e0) at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/runtime/threadSMR.cpp:470
#9  0x00007ffff559d1c7 in JavaThreadIteratorWithHandle::JavaThreadIteratorWithHandle (this=0x7fffbd400c70) at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/runtime/threadSMR.hpp:302
#10 0x00007ffff5fcdbed in os::print_location (st=0x7ffff71a1c40 <VMError::log>, x=-140735142018274, verbose=false) at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/runtime/os.cpp:1113
#11 0x00007ffff5fe35f5 in os::print_register_info (st=0x7ffff71a1c40 <VMError::log>, context=0x7fffbd401340) at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp:784
#12 0x00007ffff622f00d in VMError::report (st=0x7ffff71a1c40 <VMError::log>, _verbose=true) at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/utilities/vmError.cpp:749
#13 0x00007ffff6230c7b in VMError::report_and_die (id=11, message=0x0, detail_fmt=0x7ffff690d241 "%s", detail_args=0x7fffbd401078, thread=0x7ffff01b68e0, 
    pc=0x7ffff53a0b44 <Node::in(unsigned int) const+140> "H\213", siginfo=0x7fffbd401470, context=0x7fffbd401340, filename=0x0, lineno=0, size=0)
    at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/utilities/vmError.cpp:1406
#14 0x00007ffff62304e3 in VMError::report_and_die (thread=0x7ffff01b68e0, sig=11, pc=0x7ffff53a0b44 <Node::in(unsigned int) const+140> "H\213", siginfo=0x7fffbd401470, context=0x7fffbd401340, 
    detail_fmt=0x7ffff690d241 "%s") at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/utilities/vmError.cpp:1220
#15 0x00007ffff6230538 in VMError::report_and_die (thread=0x7ffff01b68e0, sig=11, pc=0x7ffff53a0b44 <Node::in(unsigned int) const+140> "H\213", siginfo=0x7fffbd401470, context=0x7fffbd401340)
    at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/utilities/vmError.cpp:1226
#16 0x00007ffff5fe30e2 in JVM_handle_linux_signal (sig=11, info=0x7fffbd401470, ucVoid=0x7fffbd401340, abort_if_unrecognized=1)
    at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp:612
#17 0x00007ffff5fdcca1 in signalHandler (sig=11, info=0x7fffbd401470, uc=0x7fffbd401340) at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/os/linux/os_linux.cpp:4395
#18 <signal handler called>
.... compiler crash here.
Comments
I forced a crash while _thread_in_Native and there were no issues generating the hs_err file. I think we can close this as no longer an issue.
10-10-2018

We are still running code that assume the JavaThread is in vm, I would guess there are more issues surrounding this. But this specific error I have not seen since 8191798.
10-10-2018

So with JDK-8191798 being fixed is this no longer an issue?
10-10-2018

Here's the key part of the stack trace: #5 0x00007ffff58abe85 in report_fatal (file=0x7ffff68d2be0 "/home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/runtime/thread.cpp", line=966, detail_fmt=0x7ffff68d33ec "LEAF method calling lock?") at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/utilities/debug.cpp:229 #6 0x00007ffff61afa8d in Thread::check_for_valid_safepoint_state (this=0x7ffff01b68e0, potential_vm_operation=false) at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/runtime/thread.cpp:966 #7 0x00007ffff61c0f45 in ThreadsSMRSupport::acquire_stable_list (self=0x7ffff01b68e0, is_ThreadsListSetter=false) at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/runtime/threadSMR.cpp:558 #8 0x00007ffff61c0ba1 in ThreadsListHandle::ThreadsListHandle (this=0x7fffbd400c78, self=0x7ffff01b68e0) at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/runtime/threadSMR.cpp:470 #9 0x00007ffff559d1c7 in JavaThreadIteratorWithHandle::JavaThreadIteratorWithHandle (this=0x7fffbd400c70) at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/runtime/threadSMR.hpp:302 #10 0x00007ffff5fcdbed in os::print_location (st=0x7ffff71a1c40 <VMError::log>, x=-140735142018274, verbose=false) at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/runtime/os.cpp:1113 #11 0x00007ffff5fe35f5 in os::print_register_info (st=0x7ffff71a1c40 <VMError::log>, context=0x7fffbd401340) at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp:784 #12 0x00007ffff622f00d in VMError::report (st=0x7ffff71a1c40 <VMError::log>, _verbose=true) at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/utilities/vmError.cpp:749 The compiler code is calling VMError::report() which wants to report as much useful info about the error as possible. os::print_location() is part of that useful info, but it wants to use JavaThreadIteratorWithHandle which uses a ThreadsListHandle which is very paranoid about where those are used so we blow up in: Thread::check_for_valid_safepoint_state() because that ThreadsListHandle was used in a place that's not safe (as George points out). Things to think about: 1) VMError::report() can be called from some pretty crazy places so we have to be careful about what is called. 2) ThreadsListHandle is very much meant to provide safe access to ThreadsList, but when we're crashing, that's little bit difficult to enforce. 3) The particular failing function/assertion: Thread::check_for_valid_safepoint_state() is only needed because NestedThreadsLists are currently implemented using the Threads_lock which is why we see: report_fatal (file=0x7ffff68d2be0 "/home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/runtime/thread.cpp", line=966, detail_fmt=0x7ffff68d33ec "LEAF method calling lock?") at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/utilities/debug.cpp:229 That particular limitation of Thread-SMR is being tracked via: JDK-8191798 redo nested ThreadsListHandle to drop Threads_lock https://bugs.openjdk.java.net/browse/JDK-8191798
17-04-2018

The following is not safe: #9 0x00007ffff559d1c7 in JavaThreadIteratorWithHandle::JavaThreadIteratorWithHandle (this=0x7fffbd400c70) at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/runtime/threadSMR.hpp:302 #10 0x00007ffff5fcdbed in os::print_location (st=0x7ffff71a1c40 <VMError::log>, x=-140735142018274, verbose=false) at /home/neliasso/repos/zgc/zgc2/open/src/hotspot/share/runtime/os.cpp:1113
17-04-2018

The signal handler has been working quite well for many years now. Let's consider whether the code that gets called down the track is reasonable in its expectations about thread state. Remember that we have crashed and are now trying to report as much useful information as possible - the more the handler tries to do the more likely it will hit a secondary crash and not be able to report anything useful.
17-04-2018

The compiler JavaThread is in native when the error happens. The signal handler maybe should change state to _in_vm. {code} if (is_Java_thread() && ((JavaThread*)this)->thread_state() != _thread_in_vm) { fatal("LEAF method calling lock?"); } {code}
16-04-2018