Bug ID: JDK-4515367 fatal error handler enhancements

JDK-4515367 : fatal error handler enhancements

Type: Enhancement
Component: hotspot
Sub-Component: runtime
Affected Version:
1.3.1_07,1.3.1_09,1.4.0,1.4.1,1.4.1_01,1.4.2,1.4.2_03,1.4.2_05 1.3.1_07,1.3.1_09,1.4.0,1.4.1,1.4.1_01,1.4.2,1.4.2_03,1.4.2_05

Priority: P3
Status: Closed
Resolution: Fixed
OS:
generic,linux,linux_redhat_3.0,solaris_8,solaris_9,windows_xp generic,linux,linux_redhat_3.0,solaris_8,solaris_9,windows_xp
CPU: generic,x86,sparc,itanium

Submitted: 2001-10-16
Updated: 2012-10-13
Resolved: 2003-08-19

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

Other	Other
1.3.1_11 11Fixed	1.4.2_09Fixed

Related Reports

Duplicate :	JDK-4638005 - For most cases errorid string is not specific enough or complete/accurate
Duplicate :	JDK-5003862 - RFE: print native stacktrace upon crash in hs_err_pid.log
Duplicate :	JDK-4959777 - Remove "Please refer to release documentation for possible reason and solutions"
Duplicate :	JDK-4873538 - signal handler deadlocked waiting on a malloc lock
Duplicate :	JDK-4886930 - Intermittent JRE hang
Duplicate :	JDK-6244106 - IA64 Hang occurs at malloc/free in RHEL AS 3
Duplicate :	JDK-4316406 - Better handling of low resource conditions and errors
Relates :	JDK-6536662 - win2000 platform only - Could not reserve enough space for object heap to initialize VM
Relates :	JDK-4647546 - Hotspot is using single threaded ("MT-unsafe") Solaris functions
Relates :	JDK-4963998 - VM log is not flushed when the VM aborts
Relates :	JDK-4906990 - Infinite loop in error handler if TLS is not yet initialized.
Relates :	JDK-6243251 - Document serviceability fixes in 1.4.2_09 release notes
Relates :	JDK-6194668 - Add java runtime flag SuppressFatalErrorMessage to skip all error handling logic on fatal error.

Description

There are several issues in fatal error handler that need to be resolved:

1. report_error(): in case of more than one fatal error occurs at the same
   time, we can't just return or exit if error_level > 2, that may cause
   crashes. We should hold there for a few seconds to allow normal error
   dumping and shutdown to finish.

2. Move most of the error dumping logic out of signal handler. Only async-
   safe functions are allowed in signal handlers. It's impossible to have
   stable yet complex dumping logic in signal handler.

3. Avoid memory allocation. System calls like localtime() may call malloc().
   That can cause VM hang if the crash happens while the thread is holding
   malloc lock. (see 4485056).

4. ErrorID is not so useful in case of crashes. It always points to os.cpp:xxx
   in os::report_fatal_error(). Since we know the function name nearest to
   the crash site, A better ErrorID for crashes probably is
   "function_name:offset".

5. Print offset in the library. - That can be used by a standalone utility
   (e.g. addr2line on Linux) to retrieve line numbers.

6. Dump native stack trace. Java stack trace is not so useful in some cases.
   We will have problems walking x86 stack if frame pointer is not emitted on
   the stack. But a best-effort printout is better than nothing.

7. Limit screen dump to one screen (25 lines) if possible. It's especially
   important for Windows 9x/Me users who do not have a scrollable DOS console.
   Stuffs like open dll libraries can go into error file.

8. Include JDK version in error message as well.

9. Allow user-specified error report site (not always
   "http://java.sun.com/cgi-bin/bugreport.cgi")

10. In item 6 above, it would be good if the native function names are
    demangled.

11. It would be good if there is an option to do a full Java thread dump
    upon crashes (as in EVM). Currently by turning on -XX:ShowMessageBoxOnError,
    when it crashes, we can still hit Ctrl+\ or Ctrl+Break to get a full Java
    thread dump, if the crash happens in native state, but not in Java or
    VM state. It would be good if we are offered an option to get a full
    Java thread dump no matter where the crash happens.

12. In 1.3.1, if a crash happens in compiled Java code, only the Java
    method name is shown but not its class name. And the Java stack trace
    of the crashing thread is not reliably shown. It would be good if we
    can always get the exact Java method and one easy workaround for that crash
    may be just to exclude that method from being compiled.
    
======================================================================

13. In addition, for unhandled synchronous signals on solaris we should try 
    to report the faulting pc/npc (%eip for IA32), %sp and the faulting linear
    address.  The faulting linear address can be found in the siginfo_t.  
    The trapno and si_code might be useful as well.  

14. In discussion with the webbug team, their top request is to not change
    error id between releases for the same fatal error. Currently we use
    filename plus line number to encode error id, which means whenever we make
    changes to the source file, errorid could change.

15. It would be useful to include information about system configuration,
    such as, #CPU, available memory, environment variables, etc. According to
    the webbug team, users are much better at copy-n-paste error dump than
    actually figuring out the configuration.

Comments

CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: 1.3.1_11 generic tiger FIXED IN: 1.3.1_11 tiger INTEGRATED IN: 1.3.1_11 tiger tiger-b15 tiger-b16 tiger-b18

14-06-2004

EVALUATION Name: jd12896 Date: 02/04/2002 This feature has been added for Tiger release ====================================================================== Three main areas to improve: 1. make error handling more robust (safe to use from signal handlers, should tolerate corrupted stack/heap, etc) 2. improve error id to more accurately identify a problem (ideally error id for the same guarantee failure should not change between releases, also need a better id for crashes) 3. include more information in the error report (native stack trace, register values, #CPU, etc) ###@###.### 2003-01-24 ===================================================================== I've made quite a number of changes to make the error handler more robust. Also the error handler now saves detailed error log in the hs_err file. Please see comments section for the list of changes in 1.5 ###@###.### 2003-09-26

26-09-2003