Bug ID: JDK-8225703 crash_handler code makes safepoint polling threads look like they crashed

Type: Bug
Component: hotspot
Sub-Component: runtime
Affected Version: 13,14

Priority: P4
Status: Closed
Resolution: Won't Fix

Submitted: 2019-06-13
Updated: 2020-08-19
Resolved: 2020-07-14

In debugging a crash, where one thread hit an assert, we were waylaid trying to figure out why two other threads also looked like they failed because in this core file two other threads called report_and_die from crash_handler from a Java frame:

   0x7f2af82d7794:    mov    0x108(%r15),%r10
   0x7f2af82d779b:    movabs $0x4637f38c0,%r8
=> 0x7f2af82d77a5:    test   %eax,(%r10)
r15 is the thread 
r15            0x7f2a706ff800    139820251740160
r10 is the thing we can't read
r10            0x7f2b0d231008    139822880722952
(gdb) x 0x7f2b0d231008 
0x7f2b0d231008:    Cannot access memory at address 0x7f2b0d231008
(gdb) print *(Thread*)0x7f2a706ff800
$2 = {<ThreadShadow> = {<CHeapObj<(MemoryType)2>> = {<No data fields>}, 
    _vptr.ThreadShadow = 0x7f2b0c4eacf8 <vtable for JavaThread+16>, _pending_exception = 0x0, _exception_file = 0x0, 
...
  _threads_do_token = 66652, _rcu_counter = 0, _polling_page = 0x7f2b0d231008, 

This is the polling page address.

I don't know why the signal handler is not JVM_handle_linux_signal for this thread anymore, which would have done the right thing for the safepoint polling address (and other signals that that code handles).  It shouldn't be crash_handler anyway.

ILW = MLM = P4

18-06-2019

What David wrote. Just adding that this has always been that way, to my knowledge - in its roots this is ancient Sun stuff. I think the assumption is that the moment you hit VMError::report_and_die(), your VM will be soon dead, and the error reporting thread is the only thread left that matters. If another thread hits an error or an assert - or, as in your case, a polling page segfault misfiled as a real crash - they will end up in VMError::report_and_die() too, produce an "Thread abc also had an error" message and then sleep, awaiting process end. I do not see a way around this, sorry. Secondary error handling is important. To fix this, one would have to merge the signal handlers - give the primary one the ability to function as a crash handler too - but that would be really complicated, and a bit pointless, since we are about to get aborted after error reporting anyway.

13-06-2019

When we process a crash we call VMError::reset_signal_handlers to install crash_handler as the signal handler so that secondary signals during error reporting are handled correctly. But signal handlers are per-process not per-thread so this means that signals in other threads not related to secondary error failures will also be processed by crash_handler. This seems wrong as you point out.

13-06-2019

Relates :	JDK-8191101 - Show register content in hs-err file on assert
Relates :	JDK-8227275 - Within native OOM error handling, assertions may hang the process