Bug ID: JDK-8261491 NMT: Reduce memory footprint of reporting

Type: Enhancement
Component: hotspot
Sub-Component: runtime
Affected Version: 17

Priority: P4
Status: Closed
Resolution: Incomplete

Submitted: 2021-02-10
Updated: 2023-10-03
Resolved: 2023-10-03

JDK 22
22Resolved

When encountering call sites which do allocation, NMT records the callstack together with other information (counters, memory flags) in a record (class `AllocationSite`). It then stores these records in a hash table (`MallocSiteTable`).

Then, when generating reports, one or two baselines are generated, basically snapshots of the state of these records. These base lines contain copies of the original records, callstack and all. The records may also need to get sorted, which is done by adding them to a temporary SortedLinkedList, again by value. 

This makes getting reports a bit expensive. Not massively so but enough to impose an artificial threshold on the number of records to baseline. This introduces subtle reporting errors, currently under discussion here [1][2]. In that bug I argue for the removal of that threshold for simplicity and correctness reasons.

Since the reason that threshold exists is the memory footprint of baselining, lets reduce the cost of baselining.

The callstacks we keep in the records are semantically immortal and of course immutable. When tracing is active we keep those call stacks forever. Separating the `class AllocationSite` into two classes, one containing the immutable immortal parts, and the other containing the mutable parts (counters), would reduce their size by a lot. Lets keep the immutable callstacks only once.

Unfortunately, there is a technical reason that call stacks may be deleted, in the rare case that multiple threads tried simultaneously to add the same call stack and one won. In that case, the looser would need to delete the call stack again. So the immutable part would best be refcounted from the mutable part.

[1] https://bugs.openjdk.java.net/browse/JDK-8261238
[2] https://github.com/openjdk/jdk/pull/2428

Reclosing as "Incomplete" instead of Resolving as ...
03-10-2023
[~stuefe], if we have no other alternative for this issue, can we close it?
19-09-2023
[~stuefe], what about the dynamic list of SimpleThreadStackSite (STSS) which is also inherited from the AllocationSite? Should we keep these mortal sites as well? The part 2 of a baseline makes MallocSite from this list. If we don't keep these sites, how could we reference them? class NewMallocSite { // no inheritance from AllocationSite uint32_t _index_to_site; MemoryCounter _c; ... } In the `bool ThreadStackTracker::walk_simple_thread_stack_site(MallocSiteWalker* walker)`, we have to make NewMallocSite from STTS to be added to the baseline. What would be the _index_to_site then?
24-08-2023
[˜azafari] I had a patch once, but apparently lost it. What I had in mind: Separate AllocationSite from MallocSite. AllocationSite contains stack and MemFlags, these are constant and never go away (since you never throw away an allocation site, they are immortal). Then, let MallocSite reference AllocationSite. Copying MallocSite now just copies the AllocationSite pointer. Since AllocationSite is immortal, pointer will always be valid regardless of how often we copy it around. Care must be taken when allocating AllocationSite: - If one uses os::malloc, we must avoid circularities. NMT does this by using a special stack. See how MST entries are allocated (MallocSiteTable::new_entry()). - Alternatively, one could just use raw malloc but lose the ability to track AllocationSite as part of mtNMT - Another alternative would be to keep AllocationSite in an array, either preallocated (which implies a maximum) or as ReservedSpace that we only commit on demand. The latter would save memory, but we'd still have a maximum number of sites. But now, since all AllocationSite objects live in an array, we could address them via index, as 32-bit or even 16-bit, and thus save another 4 to 6 bytes per MallocSite.
24-08-2023
MallocSite records are saved in two places: 1) the static MallocSiteTable , 2) the dynamic list in the ThreadStackTracker. At baseline, MallocSite items from both lists are copied to a SortedLinkedList which will be sorted based on a combination of `size`, `call-stack` and `NMT flag` of the allocations. static MallocSiteTable, always grows Entry1 --> Entry2 --> Entry3 stack counter \------------------------------------------/ baseline, part 1 dynamic Thread Stack Tracker, both grows and shrinks Entry1 --> Entry2 --> Entry3 stack addr+size \----------------------------------/ baseline, part 2 Since the entries in the dynamic list can be removed, we cannot reference them in the baseline and we have to save a copy of them. And also, because of the sorting we need to keep the connection between every call-stack and its counter information. [~stuefe], what do you think?
24-08-2023

Duplicate :	JDK-8227072 - NMT detailed report should work in native OOM situations.
Relates :	JDK-8261238 - NMT should not limit baselining by size threshold