Bug ID: JDK-8026324 hs_err improvement: Add summary section to hs

JDK-8026324 : hs_err improvement: Add summary section to hs_err file

Type: Enhancement
Component: hotspot
Sub-Component: runtime
Affected Version: 7-pool,8

Priority: P4
Status: Resolved
Resolution: Fixed

Submitted: 2013-10-11
Updated: 2018-03-05
Resolved: 2015-07-22

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 9
9 b77Fixed

Related Reports

Relates :	JDK-8026333 - hs_err improvement: Print GC Strategy
Relates :	JDK-8194642 - Improve OOM error reporting for JDK8
Relates :	JDK-8085865 - hs_err improvement: Printing /proc/cpuinfo makes too long hs_err files
Relates :	JDK-8026336 - hs_err improvement: Print compilation mode, server, client or tiered

Description

hs_err files are currently divided into the sections "THREAD", "PROCESS" and "SYSTEM". The problem with this is that a lot of important information is hidden far down in the file. The VM Arguments are below a long list of Dynamic libraries, the OS version is printed in the last section, etc. Not only is it hard to get a quick overview of an hs_err file, but in many cases we won't even get the full hs_err file from customers or the webbugs system. It may have been truncated, or someone just copies parts of it.

Therefore, we should add a SUMMARY section at the top of the hs_err file. The SUMMARY section should mostly include one-liners, and should include everything needed to get a quick overview of the crash.

Suggested SUMMARY:

---------------  S U M M A R Y  ---------------

time: Fri Oct 11 14:28:57 2013
elapsed time: 43 seconds

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=0 (SEGV0), si_addr=0x000002a50000624f

vm_info: Java HotSpot(TM) 64-Bit Server VM (25.0-b48) for linux-amd64 JRE (1.8.0-ea-b106), built on Sep  4 2013 19:11:21 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)

OS:Red Hat Enterprise Linux Server release 5.3 (Tikanga)

uname:Linux 2.6.18-128.el5 #1 SMP Wed Jan 21 08:45:05 EST 2009 x86_64
libc:glibc 2.5 NPTL 2.5 
rlimit: STACK 10240k, CORE 0k, NPROC 38911, NOFILE 1024, AS infinity
load average:0.02 0.04 0.04

CPU:total 4 (2 cores per cpu, 1 threads per core) family 6 model 15 stepping 6, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, tsc
Memory: 4k page, physical 4047940k(639792k free), swap 4096564k(4096448k free)

VM Arguments: -Xmx32m
java_command: MyClass 1234
java_class_path (initial): .
Launcher Type: SUN_STANDARD

Environment Variables:
JAVA_HOME=/localhome/java/jdk8
PATH=/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin
SHELL=/bin/bash
DISPLAY=localhost:10.0

VM state:not at safepoint (normal execution)
VM Mutex/Monitor currently owned by a thread: None

Heap:
 PSYoungGen      total 18944K, used 983K [0x00000000eb600000, 0x00000000ecb00000, 0x0000000100000000)
  eden space 16384K, 6% used [0x00000000eb600000,0x00000000eb6f5cc8,0x00000000ec600000)
  from space 2560K, 0% used [0x00000000ec880000,0x00000000ec880000,0x00000000ecb00000)
  to   space 2560K, 0% used [0x00000000ec600000,0x00000000ec600000,0x00000000ec880000)
 ParOldGen       total 41984K, used 0K [0x00000000c2200000, 0x00000000c4b00000, 0x00000000eb600000)
  object space 41984K, 0% used [0x00000000c2200000,0x00000000c2200000,0x00000000c4b00000)
 Metaspace total 4486K, used 2755K, reserved 132096K
  data space     4100K, used 2466K, reserved 1024K
  class space    386K, used 289K, reserved 131072K

---------------

This summary will give a lot of the information needed to quickly categorize a crash. Did it happen during start-up or after several days? Is it a supported platform? Did we run with any strange VM arguments? Is the heap close to full?

Note, that there are many other suggestions for information to be added to the hs_err file. Some of these should definitely go to the summary, like if we run on a hypervisor or not.

EDITED: There were some discussions about if we should duplicate the information in the summary. For example, the OS name should clearly be in the summary but it also makes sense to have it in the more detailed SYSTEM section. In that case, this information should be printed twice, but in many cases, just having the information in the SUMMARY section is enough. (Original suggestion was that everything should be both in the SUMMARY and in the detailed sections below).

Comments

Updated proposal: --------------- S U M M A R Y ------------ Command Line: -XX:+UseSharedSpaces Kaboom Host: mymachinename, Intel(R) Core(TM) i5-3550 CPU @ 3.30GHz, 4 cores, 7G, Ubuntu 14.04.1 LTS Time: Tue Jul 14 15:13:18 2015 EDT elapsed time: 0 seconds (0d 0h 0m 0s) --------------- T H R E A D --------------- Leaving a blank line between command line and Host because sometimes the command lines are long and you'd lose the host information.
14-07-2015
So I've been reading lots of hs_err files and the part that's printed to standard out and is at the top of the hs_err_pid file is a summary: # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007fd59cbba696, pid=1426, tid=0x00007fd59d731700 # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-debug-cphillim_2015_06_12_06_53-b00) # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.9.0-internal-debug-cphillim_2015_06_12_06_53-b00-coleen mixed mode linux-amd64 compressed oops) # Problematic frame: # C [libc.so.6+0x8c696] memset+0xd6 # # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %P" (or dumping to /home/cphillim/work/core.1426) # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # It has the error printed, the jdk and Hotspot versions. It even says whether sharing is on and compressed oops. I don't think this top set of lines should be changed. Different people need different information from the hs_err file so putting everything in the summary would make it, well, not a summary anymore. The main requests have been for this information: --------------- S U M M A R Y ------------- Ergonomic flags: -XX:InitialHeapSize=67108864 -XX:MaxHeapSize=1073741824 -XX:ReservedCodeCacheSize=251658240 -XX:+SegmentedCodeCache -XX:+UseParallelGC Compiler flags: -server -Xmixed -XX:+UseTieredCompilation VM Arguments: java_command: Kaboom java_class_path (initial): . Launcher Type: SUN_STANDARD CLASSPATH= OS:Linux philli 3.13.0-53-generic #89-Ubuntu SMP Wed May 20 10:34:39 UTC 2015 x86_64 time: Fri Jun 12 13:08:12 2015 timezone: EDT elapsed time: 0 seconds (0d 0h 0m 0s) ---------- end which is a smaller set than the proposed set above. I've been trying to shorten some information in the hs_err_pid file so that all of the information is useful and easy to find. So I propose initially the new summary have: Ergo flags: cover GC and heap sizes. Compiler flags: it's somewhat obvious from the ## output above. We may want to print more compiler flag settings like TieredCompilation VM command line: maybe reformatted so you can cut/paste the line environment settings for only CLASSPATH and LD_LIBRARY_PATH OS information: I chose uname because it's one line. On windows, it's a nicer line with the OS and version. I could make the uname line nice. Time and elapsed time: Move this up from the end of the file
11-07-2015
I attached a couple of sample files.
07-07-2015
Modified proposal above. For Summary, instead of Options: put the Command Line, eg: Command Line: -XX:+UseSerialGC -XX:-UseCompressedOops Kaboom Additionally: 3. Move native and Java stack up in the hs_err_pid, closer to the beginning of the T H R E A D section. --------------- T H R E A D --------------- Current thread (0x00007f1e84010000): JavaThread "main" [_thread_in_vm, id=2044, stack(0x00007f1e8bc 31000,0x00007f1e8bd32000)] Stack: [0x00007f1e8bc31000,0x00007f1e8bd32000], sp=0x00007f1e8bd304c8, free space=1021k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libc.so.6+0x8c696] memset+0xd6 V [libjvm.so+0x78496e] Copy::fill_to_bytes(void, unsigned long, unsigned char)+0x3b V [libjvm.so+0x8526e2] Copy::fill_to_memory_atomic(void, unsigned long, unsigned char)+0x168 V [libjvm.so+0x107ed09] Unsafe_SetMemory+0x22d 4. Move "Register to Memory Mapping" above the printing Registers, Top of Stack and Instructions section 5. Move current compiler task to top of T H R E A D section, if a crash in the compiler.
07-07-2015
Here is a new proposal. 1. Add special ergonomic global setting flags into the version string printed out in the beginning of the hs_err_file preamble: # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.9.0-debug-jdk9.hs_err_0624_1709, mixed mode, tiered , compressed oops, parallel gc, linux-amd64) This adds "tiered" compilation if -XX:+UseTieredCompilation, and GC mode. There are already additional strings added if UseSharedSpaces "sharing" and "compressed oops" and the platform string. When you do java -version you get the version string, compilation mode (eg "mixed mode"), "sharing" added if sharing, but you only get "compressed oops" and the platform string in the hs_err file. There was vigorous debate not to add "compressed oops" to the version string so I assume there would be the same for the additional information we add to the hs_err version string. I still think it's the place to add it because we add other things and it's compact. 2. Add summary file with the information that we generally add to the bugs that a. Java options that ran b. <machine name in debug>, the cpu, # processors, amount of memory, operating system version c. date/time/elapsed time for the bug eg. --------------- S U M M A R Y ------------ Options: -XX:+UseParallelGC Host: my-machine-name, Intel(R) Core(TM) i5-3550 CPU @ 3.30GHz, 4 cores, 7G, Ubuntu 14.04.1 LTS time: Thu Jul 2 15:21:01 2015 timezone: EDT elapsed time: 0 seconds (0d 0h 0m 0s)
02-07-2015
I want to work on this bug. I think the summary should include information about which GC is being used, which compiler mode (other bugs exist for this), the crashing location and the name of the OS. The other information is brief enough until you get to the DLL's loaded, the NTAMS for G1 and the printing of /proc/cpuinfo.
04-06-2015
Please send out a proposal before implementing.
01-06-2015
Yes, please! https://bugs.openjdk.java.net/browse/JDK-8080127#comment-13646497 Except I'd like to see the stack trace and assertion message.
26-05-2015
Comment from John Rose on the mailing list: Some of the configuration information you mention may be present at the top of the hotspot.log file, before the big <tty> element. It might be fruitful to ensure that such preamble information is always captured at startup, and dumped into the log file. I agree that any static information should be captured and saved during startup, it would take up very little space.
17-02-2014
I edited the description with: "There were some discussions about if we should duplicate the information in the summary. For example, the OS name should clearly be in the summary but it also makes sense to have it in the more detailed SYSTEM section. In that case, this information should be printed twice, but in many cases, just having the information in the SUMMARY section is enough.(Original suggestion was that everything should be both in the SUMMARY and in the detailed sections below).."
17-02-2014
Regarding leaving lines at the top, we could do that, but I would like to see if this is actually an issue first. The data we add to the summary will be very safe, I see no reason why we should fail to print it. Also, if we do crash again further down in the hs_err file, I would prefer if we had already written the summary. I don't think this will be a big problem to implement, we can re-visit this if we run into issues.
18-10-2013
Regarding having a second summary file, I am a bit worried about that approach, it could lead to added confusion. Also, I believe a lot of the time we will get the output from the standard output log, so either we print the summary there and miss the details, or we print the details and still don��t get a summary for many incoming bugs. However, we don��t have to duplicate all the information. If we put JVM arguments in the summary, we don��t need to print it further down. That way, we don��t get as much duplication. Some things will probably need to be duplicated though. It is important to print the OS version, but that should really be in the SYSTEM section as well.
18-10-2013
We could reserve a bunch of space characters at the beginning of the file. Later, we can seek back to the beginning and write a summary. We can do this because we're writing to a file, not to stdout. If the summary is sufficiently succinct, leaving 10 lines of 80 space characters at the top of the file should be enough, and it won't look too disastrous if we somehow fail to fill in the summary.
17-10-2013
I got a question on the reasoning why this summary is needed, thought I should add the main reasons below: 1) The most important reason is that there are a lot of other features coming in to catch common user errors, give better hints to problems etc (see bugs tagged with hs_err_improvements). One goal is to make it easier for users and customers to find the issue themselves. However, right now, a lot of important information is "hidden" in big masses of text that most users won't be interested in. A reasonably skilled customer/user could be expected to read a summary and the stack trace, so therefore we want a dedicated place for important information. We could add as much information as we like to the hs_err file, but if it is hidden in the middle, most "regular" users won't find it 2) When doing reviews of a lot of incoming Java Incidents, having the most critical information readily available at the top is very helpful. It is just a convenience, but when doing 50 bugs in a row, having a summary is great. Again, there are other features coming in to improve this work, it will be helpful to have it at the top 3) It is common to not get the full hs_err data. Either some tool truncates it, or the filer just pastes parts of it in a bug update. By having a summary at the top, this will give more information.
17-10-2013
I would like to suggest to use two log files instead of just only one. I would like to suggest to separate the actual error log (technically detailed) from a user friendly, human readable text file. The 2nd file could store the summary. That approach will help to avoid having redundancies in one file, it will solve the conflict of interests to have both a summary and a robust error reporting, and it will give us the possibility to write additional suggestions to the 2nd file that we would like to tell the user about a crash.
16-10-2013
A problem is that hs_err files sometimes gets truncated when they are filed, or the filer just pastes the top part. I agree that the the crash and the stack trace is probably what dev is most interested in, but for support and sustaining thinks like VM args and OS version is just as important. I don't know if anything in the suggestion would be risky to move. Time, siginfo, vm_info, OS info, VM and environmemnt arguments, they all sound rather safe. If anything, it would be the heap information that could be risky. But of course, when we implement this, we should look at all calls we make and make sure that none of them are "risky". I would think that most of them are simply printing data that is already gathered. As you say though, we can't do all data gathering first. Some of the data we gather is definitely risky and may stop the rest of the hs_err file from being generated. I see that as an argument for a summary at the top though. As long as we don't do anything risky there, it will make sure that the most basic information support and sustaining needs is available, even if we crash later.
14-10-2013
The description here sounds like it needs the information for the hs_err file to all be collected then the summary constructed before the details. But error reporting can't work like that - we never know how far we will get before the error reporting itself might encounter a fatal problem. Some of the information, such as VM args, can be easily moved higher up. Personally as a developer what I see in the top sections of the hs_err log is what I am interested in - the crash and the stack trace etc.
14-10-2013
We could add the summary section to the bottom of the hs err log. We could also add a short hint to the top: "For a summary of this crash report please have a look at the bottom of this file" or something like that.
14-10-2013