JDK-8206922 : Show backtrace of all threads, not just the one that crashed
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 11
  • Priority: P4
  • Status: Closed
  • Resolution: Won't Fix
  • Submitted: 2018-07-09
  • Updated: 2019-05-28
  • Resolved: 2019-01-29
Related Reports
Relates :  
Description
With a concurrent/multithreaded architecture, some crashes (bugs) can be only analyzed with a full backtrace of all threads, to see their interactions.

Currently hs_err file only shows a backtrace of the thread that crashed, which often necessitates that the user tries to repeat the bug locally with a debugger attached, so that we get the full picture of all threads backtrace.

It would be very helpful if hs_err file showed backtrace of all the threads.

For an example of an investigation of a bug that would benefit from this enhancement see JDK-8206471
Comments
Runtime Triage: This is not on our current list of priorities. We will consider this feature if we receive additional customer requirements.
29-01-2019

[~gziemski] I think what you're looking for is already supported by external tools. Windows has "just in time debugging". When a JVM crashes, you can get a VisualStudio dialog that asks if you want to debug or not. On Linux, you can run your JVM within gdb, using a script that looks like this: $ cat foo.gdb r if $_thread == 0 echo :child is no longer running, quitting\n quit end $ gdb -q -x foo.gdb --args /home/iklam/jdk/bld/sandbox/images/jdk/bin/java -version Reading symbols from /home/iklam/jdk/bld/sandbox/images/jdk/bin/java...(no debugging symbols found)...done. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". java version "13-internal" 2019-09-17 Java(TM) SE Runtime Environment (build 13-internal+0-adhoc.iklam.open) Java HotSpot(TM) 64-Bit Server VM (build 13-internal+0-adhoc.iklam.open, mixed mode, sharing) [Inferior 1 (process 31607) exited normally] :child is no longer running, quitting If your JVM crashes, you will break into gdb. On MacOS, you can probably do something with the "lldb" debugger with a similar script.
21-01-2019

This seems like something that should be done via an external tool and -XX:OnError=....
20-01-2019

How do you propose to examine the stack of concurrently executing threads without some form of handshake, all of which has to be executed in the context of a signal handler whilst the state of the current thread and its ability to participate in any kind of handshake is completely unknown? You mention "thread groups" - not sure what you are referring to in general native thread context. I vaguely recall hearing about such thing in linux context. Is this to be a linux only proposal? Even with thread groups can you signal the group (which will be every thread in the process) from within a signal handling context of a thread already part of the group? Or do you plan to iterate the threads (effectively handshake via signal)? BTW we can get cores from paying customers too. hs-err files are primarily for webbugs reports.
17-01-2019

Not sure why you would crash, if you think you found e.g. a return pointer you obviously must validate it before accessing it. Yes, if you catch the signal non-receiving threads continue. So you would need signal the thread group, making the windows much larger for moving away from troublesome code. I agree in a in-house crash core is the way to go, but for customer crash where we only get hs_err as much info as possibly is preferably. We should fix 'core-pack' generator or setup /proc/[pid]/coredump_filter to include all regions, since now I must find all system libs to get gdb happy. Yes, there is risk for hs_err becoming insanely large. An alternative it to include the JFR events in the thread local buffer for each thread. (and add missing relevant events) This would at least give a hint on what code parts the thread have being touching recently.
17-01-2019

Your are more likely to cause a secondary crash than produce anything useful. And when would you do this within error reporting? The other threads are still executing up to the point where the process terminates. We would need a crash to "suspend" all threads if we wanted to get a useful snapshot of the system. That said, the hs-err files could become huge and unmanageable in their own right.
16-01-2019

We can do best effort by starting at the stack-base walking until we have no clue what we are looking. It would produce both false frames and have missing frames, but knowing that it is unreliable it's better have it than no information. For java stacks we can always print those which have last java frame. With a terminating signal all threads stop pretty quick. But we have a problem with assert/guarantee which thinks it's a good idea to do error reporting _before_ raising the signal. Which gives all other thread have plenty of time going places. If I have multiple threads involved I always change the assert just abort to be able to debug proper. I think we should always to error reporting from the signal handler and do best effort on stack-traces for all threads.
16-01-2019

We could attach debugger when launching a test, then automate getting the stack trace at crash - replicating steps that a user does to generate a report. This would be done only when the user expects a crash, OFF by default.
16-07-2018

A debugger suspends all threads so it can grab the stack traces. The hs-err mechanism doesn't, and can't do that, so it really makes no sense to try and modify the hs_err mechanism. Further it is still the case that in general when the crash happens the other threads involved may no longer be anywhere near that code. Even if you attach a debugger automatically after the crash happens it may already be too late as the threads can have moved on. Ditto for requesting a stack dump via SA.
10-07-2018

The way JDK-8206471 was found was by running the test locally with a debugger attached, so that at the time of a crash it was possible to see the backtrace of all threads by manually doing something like "thread apply all bt" (t a a bt). If it's not possible to dump all threads backtrace via the mechanism producing the hs_err file, then we should consider adding a new mechanism for attaching a debugger automatically and dumping the stack trace of the threads via the debugger (would SA be of help here?)
10-07-2018

Further seeing all the stacks at the time of the crash would not in all likelihood shed any light on race conditions. The thread you raced with may already have left the code in question and there is no trace of it left on the stack.
09-07-2018

This is not practical. You have no idea what state other threads are in and can't necessarily walk their stacks without crashing. Nor can you bring the system to a safepoint such that you can walk their stacks because you are already crashing! Producing the hs_err file already relies on good luck to produce as meaningful a report as possible. This is what core dumps are for.
09-07-2018