JDK-8337517 : Redacted Heap Dumps
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: svc
  • Priority: P4
  • Status: New
  • Resolution: Unresolved
  • Submitted: 2024-07-30
  • Updated: 2024-08-02
Related Reports
CSR :  
Relates :  
Relates :  
Description
Add command line option in jcmd and a runtime flag -XX:+HeapDumpRedacted that will generate a heap dump with primitive values zeroed/redacted. This is useful in debugging and analyzing the heap dumps of programs where objects may contain confidential or personal identifiable information. Object size and connectivity information is often sufficient for the vast majority of cases where heap dumps are used for debugging.
Comments
For JDK-8219721 we reverted the change that added an extra arg because it wasn't needed. The extra cmd option was simply passed within the existing protocol arg which was then parsed to get the separate options (IIRC). The general problem of changing the number of args and why it wasn't working was covered in this lengthy thread: https://mail.openjdk.org/pipermail/serviceability-dev/2019-February/027240.html and continued https://mail.openjdk.org/pipermail/serviceability-dev/2019-March/027315.html The crux of the issue seemed to be: "There's more to be done in this area as there is obviously a misunderstanding about the "args" expected in the "packet" versus the 'args' for any particular command."
01-08-2024

Yes, I removed the jmap feature for now. The redact was originally the last argument, but I verified based on your comment that older versions of jcmd and jmap would freeze when running against the JVM expecting 4 args. It seems like JDK-8219721 -histo didn't need more than 3 args so the fix was to revert it back to 3. However, -heap already has 3 args in use so there is no more room for redact. It would probably be a large change in the API to allow for more arguments while maintaining compatibility JDK-8219896.
01-08-2024

[~hlin] I noticed you dropped jmap tool support for this. Is that because of the need for an extra option, which creates a backwards compatibility issue? I think JDK-8219721 may have addressed this issue, but you should check with [~dholmes] to make sure. I had some recollection of previous changes to add another argument being partly undone, but based on the diff for JDK-8219721 it looks like David might have gotten it working. I think it just requires that the new argument come last, but I think it also means that a new version of the tool can attempt to pass the argument, and if run against an older JVM it will be ignored (until changes are backported and deployed).
01-08-2024

Oh, and this storage sanitization has another dimension. If JVM crashes for some reason mid-dump, we would need to deal with the partial dump files. That does not sound complicated with heap sanitization tools that run right after the dump: after all, if `.hprof` is corrupted, we can just claim failure and delete the original dump file regardless. But things like two-phase heap dumps (JDK-8306441) that write out the chunks of the dump into temporary files, and then merge them, could have interesting implications on this story. E.g. if we crash mid-chunk-dump, there is no `.hprof` file, and we would need to look around for deleting the chunks. You'd think they also end with `.hprof`, but they are not :) Not writing sensitive stuff in those heap dump chunks side-steps a lot of this mess.
01-08-2024

The alternative is to use the separate heap dump sanitization tools. That alternative is not exclusive with suggested JVM improvement, but comes as a useful addition to defense-in-depth strategy. Heavily fortified environments have several layers of defenses on exfiltration path: dump sanitization -> dump checkers -> dump use audit. Making JVM not to dump the confidential data to begin with adds another protective layer to this scheme, which covers the case when any of the subsequent layers fail. Plus, it gives an edge in security posture: nothing sensitive hits the file storage, which makes post-dump storage sanitization questions less pressing.
01-08-2024

(copying things from PR, so they are not buried there) Notes from the field, looking through real world heap dumps: while most of the time the confidential data is in primitive arrays (key material, cipher buffers, string contents), primitive fields carry identifiable data as well, e.g. numeric account/transaction IDs. Even double/floats contain data often, think financial data or even (pants heavily) LLM weights. A good approach is to strip everything that is not needed to follow-up on heap occupancy problems, as this is an overwhelmingly major use case. I think the approach of "strip everything, but the shape of the object graph and the shape of the objects" is a very reasonable thing to do. This is what zeroing out all primitive fields and primitive arrays contents achieves, IMO.
01-08-2024

> I read the closing of JDK-8078234 as it simply not being a high enough priority at the time. Which implies it was not considered necessary and/or worthwhile. It sounds to me that what your are looking for is more a summary heap dump - with the classes and instances and links between them, but no actual "data" as such from any object. Edit I'd missed the link to the PR when I wrote that. > The change doesn't seem particularly intrusive to me, Right. I had assume general purpose oop printing/dumping logic was involved but it seems specific to the HeapDumper with no affects elsewhere.
01-08-2024

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/20409 Date: 2024-07-31 19:10:41 +0000
31-07-2024

I read the closing of JDK-8078234 as it simply not being a high enough priority at the time. For our organization (and I would think other support organizations), this is a high priority. We don't want 3rd party customers sharing their unsecured heap dumps with us. We would prefer to make it as easy as possible for them to redact confidential information from the heap dumps. In many cases, we must explain to the users exactly how to create a heap dump in the first place. The change doesn't seem particularly intrusive to me, but we can discuss that in the forthcoming pull request.
31-07-2024

JDK-8078234 was closed many years ago as not being considered necessary or worthwhile. This seems likely to be quite an intrusive change and relies on the application deployer, or the tool user, to know that the flag is needed. Applications that are concerned about plain text confidential or personally identifiable information, should perhaps be taking steps to not keep it in plain text within Java object fields. The ability to redact a heap dump seems like "false security" to me.
31-07-2024