JDK-8211846 : A new switch to control verbosity of hs-err files
  • Type: CSR
  • Component: hotspot
  • Sub-Component: runtime
  • Priority: P4
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 12
  • Submitted: 2018-10-08
  • Updated: 2019-09-12
  • Resolved: 2018-11-27
Related Reports
CSR :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
Summary
-------

A new switch controls hs-err file verbosity.

Problem
-------

SAP added a lot of functionality to the crash reporter, and plans on continuing this work. However, we met some resistance and keep having reoccurring discussions. The reason is that there are conflicting goals for the error reporter:

One side prefers to keep crash reports brief and succinct, possibly at the expense of completeness, since support usually works with cores or debuggable repro cases.

The other side (we) would like to have the crash reports as complete as possible, to better support cases where these files are our sole information source.

Part of this discussion is whether to include possibly security relevant information (as one example, see discussion linked above about a proposal to add current user id). Again there are two usage scenarios: in cases where crash reports are getting posted verbatim in forums and bug reports one should be discreet. Where error reports are handled through secure channels and all parties are covered by support contracts, this discretion is not needed.

(For completeness sake, note that crash reports may always contain security relevant information, since they include register content and memory portions dumped.)

See also prior discussion: http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-October/034505.html

Solution
--------

A switch could control inclusion of information into the hs-err file which is either considered too extensive, too security relevant, or both. 

Name proposal: "-XX:+-ExtensiveErrorReports", 

Which information in particular is controlled can then be decided on a case by case base when reviewing hs-err file additions.

The switch should be false by default, to keep error reports small in the stock product OpenJDK. It should be true in !product.

The switch should not affect jcmd VM.info.


Specification
-------------

    --- a/src/hotspot/share/runtime/globals.hpp     Mon Oct 08 13:25:39 2018 +0800
    +++ b/src/hotspot/share/runtime/globals.hpp     Mon Oct 08 12:03:39 2018 +0200
    @@ -2604,7 +2604,12 @@
               "Start flight recording with options"))                           \
                                                                                 \
       experimental(bool, UseFastUnorderedTimeStamps, false,                     \
    -          "Use platform unstable time where supported for timestamps only")
    +          "Use platform unstable time where supported for timestamps only") \
    +
    +  product(bool, ExtensiveErrorReports, PRODUCT_ONLY(false) NOT_PRODUCT(true), \
    +                 "Error reports include extensive information")               \
    +
    
     #define VM_FLAGS(develop,                                                   \
                      develop_pd,                                                \



Notes
-----

The error output controlled by this flag will be intermingled with the existing, unconditional output. However, setting this switch to true should not endanger existing error reporting steps (so, should not degrade the error reporting as a whole). 
Therefore, only full steps shall be protected by this switch - since every error reporting step runs under its own secondary error protection, that means that even if an extended step crashes, the subsequent steps - extended or not - will continue to work.

Code example:

      STEP("printing type of error")
      ....
      STEP("Cool extended feature producing lengthy output")
        if (ExtensiveErrorReports) {
    	... bla
        }

      STEP("printing exception/signal name")

 
But not like this:

      STEP("printing type of error")
      
          switch(static_cast<unsigned int>(_id)) {
             case OOM_MALLOC_ERROR:
             case OOM_MMAP_ERROR:
             if (ExtensiveErrorReports) {
    	           ... bla
             }
    
      STEP("printing exception/signal name")



Comments
Thanks @dholmes, I added a backport (https://bugs.openjdk.java.net/browse/JDK-8214979) and a CSR for that backport (https://bugs.openjdk.java.net/browse/JDK-8214980) for jdk11.
06-12-2018

[~stuefe] It seems you need to "Create Backport" explicitly for the main issue - JDK-8211845 - and then from the backport "Create CSR" and basically repeat in that CSR what you have in this one.
06-12-2018

[~stuefe] I'll make enquiries and try to get that clarified.
06-12-2018

Hi David, thanks for explaining. I think http://openjdk.java.net/projects/jdk-updates/approval.html could be fleshed out a bit. "If a change requires CSR approval, that approval must be obtained prior to making the push approval request." I took that as the CSR itself has to be marked with an downport request.
06-12-2018

Backport request should be on the main issue not the CSR issue.
05-12-2018

Hi, I see that the jdk11u-fix-request has been removed from the CSR without comment. Has the backport been rejected?
05-12-2018

Fix Request: The switch is needed in jdk11u to allow backporting JDK-8211326 "add OS user related information to hs_err file".
04-12-2018

Release note drafted: JDK-8214160 [~stuefe] can you please review the release note and add a comment indicating you have reviewed it. Thanks.
21-11-2018

Retroactively approving. Please draft the release note as previously discussed.
21-11-2018

This patch has been already pushed, accidentally, since I got confused by the CSR process. Sorry!
20-11-2018

[~stuefe] Thanks for the updates - looks good. Yes I can assist with release notes and other doc update.
06-11-2018

@dholmes, thanks for the review. I updated the text to address your concerns. I will now produce the patch and post it for RFR. I am not sure though about how to modify the release notes, but we can cross that bridge when the patch is in.
06-11-2018

The full specification line would be: + product(bool, ExtensiveErrorReports, PRODUCT_ONLY(false) NOT_PRODUCT(true), but otherwise this proposal seems okay and I've added myself as a reviewer. This will need to be documented in the release notes and as part of the "java" command reference. Additional tasks will be needed for that. I'm not sure exactly how we will document this. It's unclear to me (from the CSR) whether the extended info is produced inter-mingled with the core info, or after it? If inter-mingled and the extended info leads to secondary failures we may have to re-evaluate the "true in non-product" if it actually reduces our ability to get good error logs.
05-11-2018

[~stuefe], the concept of this change looks reasonable. Please have one or more HotSpot engineers review this CSR before it is finalized for the second phase of CSR review. David Holmes is the CSR member in the HotSpot area. Moving to Provisional.
05-11-2018