JDK-8261334 : NMT: tuning statistic shows incorrect hash distribution
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 11,15,16,17
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2021-02-08
  • Updated: 2021-06-30
  • Resolved: 2021-02-08
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 16 JDK 17
11.0.11Fixed 16.0.1Fixed 17 b09Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
Tuning statistics for the malloc site hash map in NMT use a MallocSiteWalker to walk the malloc sites.

There is a bug in the report code which causes hash distribution statistics displayed to be widely off:

```
Hash distribution:
  1    entry: 179
  2  entries:  79
  3  entries:  66
  4  entries:  72
  5  entries:  98
  6  entries:  75
  7  entries:  55
  8  entries:  43
  9  entries:  22
 10 entries:   16
 11 entries:    5
 12 entries:    6

```

This is the bucket chain length histogram. Note that the sum of all values is 716 which exceeds the table width of 511, the total number of buckets. Depending on the hash function used, the statistics may be way more off though.

The problem is caused by a bug in the walker code where the bucket index is calculated by manually mod'ing `NativeCallStack::hash()` with table size:

https://github.com/openjdk/jdk/blob/d0a8f2f737cdfc1ae742d47f2dd4f2bbc03f4398/src/hotspot/share/services/memTracker.cpp#L263

which is wrong since the hash is defined as signed int. So it yields incorrect index values if the hash code is <0, compared with the regular hashcode-to-index calculation done in the table itself:

https://github.com/openjdk/jdk/blob/d0a8f2f737cdfc1ae742d47f2dd4f2bbc03f4398/src/hotspot/share/services/mallocSiteTable.hpp#L243

which leads the walker class to reckon that we now enter a different bucket chain and closes the former bucket chain off. Therefore, in the end, we appear to have more chains than table size would allow for.

This is an old bug, in there since almost the start of NMT (JDK-8046598). Note that it causes the statistics to look better than it actually is, since it reports a long chain as multiple short chains.


Comments
Fix Request (16u) Backporting this low-risk one line fix prevents this bug from occurring in JDK-16u. The original bug fix patch applied cleanly. After applying the patch to a JDK-16u repo, the fix was regression tested by running Mach5 tiers 1 and 2 on Linux, Windows, and Mac OS, and running tiers 3-5 on Linux x64.
22-02-2021

Fix Request (11u) This fixes the day 1 problem with hashcode calculation in NMT. Patch applies cleanly to 11u, passes tier{1,2,3}. This also matches 11.0.12-oracle.
15-02-2021

Changeset: 20d7713c Author: Thomas Stuefe <stuefe@openjdk.org> Date: 2021-02-08 18:46:18 +0000 URL: https://git.openjdk.java.net/jdk/commit/20d7713c
08-02-2021