JDK-8199067 : [REDO] NMT: Enhance thread stack tracking
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 11
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2018-03-05
  • Updated: 2021-01-22
  • Resolved: 2018-05-03
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11
11 b13Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8199133 :  
Description
These three tests fail with the nmt stack tracing change:
FAILED: runtime/appcds/sharedStrings/FlagCombo.java
FAILED: runtime/NMT/CheckForProperDetailStackTrace.java
FAILED: runtime/NMT/PrintNMTStatistics.java

#  Internal Error (/scratch/cphillim/hg/11metadata-purge/open/src/hotspot/share/services/virtualMemoryTracker.cpp:108), pid=4744, tid=474
5
#  assert(contain_region(addr, size)) failed: Not contain this region

Command Line: -XX:+UnlockDiagnosticVMOptions -XX:NativeMemoryTracking=detail -XX:+PrintNMTStatistics 

Host: xxxxxxxx, Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz, 72 cores, 503G, Oracle Linux Server release 7.0
Time: Fri Mar  2 15:58:00 2018 EST elapsed time: 0 seconds (0d 0h 0m 0s)

Current thread (0x00007f692c037000):  JavaThread "DestroyJavaVM" [_thread_in_vm, id=4745, stack(0x00007f69369f1000,0x00007f6936af2000)]

Stack: [0x00007f69369f1000,0x00007f6936af2000],  sp=0x00007f6936aefdf0,  free space=1019k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x180ae62]  VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, c
har const*, int, unsigned long)+0x162
V  [libjvm.so+0x180bd5f]  VMError::report_and_die(Thread*, char const*, int, char const*, char const*, __va_list_tag*)+0x2f
V  [libjvm.so+0xafdded]  report_vm_error(char const*, int, char const*, char const*, ...)+0xdd
V  [libjvm.so+0x17f072b]  ReservedMemoryRegion::add_committed_region(unsigned char*, unsigned long, NativeCallStack const&)+0x6b
V  [libjvm.so+0x17f1076]  VirtualMemoryTracker::walk_virtual_memory(VirtualMemoryWalker*) [clone .constprop.34]+0xf6
V  [libjvm.so+0x17f1170]  VirtualMemorySummary::snapshot(VirtualMemorySnapshot*)+0x20
V  [libjvm.so+0x13320e0]  MemBaseline::baseline(bool)+0x250
V  [libjvm.so+0x1341ad2]  MemTracker::report(bool, outputStream*)+0x1b2
V  [libjvm.so+0xea2d97]  print_statistics()+0x527
V  [libjvm.so+0xea4790]  before_exit(JavaThread*)+0x500
V  [libjvm.so+0x175d165]  Threads::destroy_vm()+0x135
V  [libjvm.so+0xfc7a02]  jni_DestroyJavaVM+0x182
C  [libjli.so+0x3bbb]  JavaMain+0x26b

Comments
Wrongfully blamed thread-SMR. Some types of thread, ex. Watcher thread and ConcurrentGCThread, etc. *do* exit without notifying NMT (without deleting thread object), that causes a lot of grief to final reporting.
06-03-2018

I think we should back out the change and redo it.
06-03-2018

I jumped conclusion too fast, I missed "delete this" in ThreadsSMRSupport::smr_delete(). However, holes in stack are real, that throw binary search off guard. Maybe, it is how Linux works(?), at least, we can not use binary search here. ==== 0x00002b36a5a54000 - 0x00002b36a5b54000 [0x00002b36a5b52000 - 0x00002b36a5b53000] mapped/committed [0x00002b36a5b51000 - 0x00002b36a5b52000] mapped/committed [0x00002b36a5b50000 - 0x00002b36a5b51000] not mapped/committed [0x00002b36a5b4f000 - 0x00002b36a5b50000] mapped/committed [0x00002b36a5b4e000 - 0x00002b36a5b4f000] not mapped/committed [0x00002b36a5b4d000 - 0x00002b36a5b4e000] not mapped/committed [0x00002b36a5b4c000 - 0x00002b36a5b4d000] not mapped/committed [0x00002b36a5b4b000 - 0x00002b36a5b4c000] not mapped/committed [0x00002b36a5b4a000 - 0x00002b36a5b4b000] not mapped/committed [0x00002b36a5b49000 - 0x00002b36a5b4a000] not mapped/committed
06-03-2018

> Apparently, recent threadSMR support, made it possible that NMT > has a thread stack record, but the real thread already exited. This is the exact opposite of what Thread-SMR is supposed to do for you. We need to take a look at the use of ThreadListHandles by NMT to see what might be going on here.
06-03-2018

Apparently, recent threadSMR support, made it possible that NMT has a thread stack record, but the real thread already exited. Which means that we may be tracking completely irrelevant memory, as the memory may have been reused by kernel. What made it worse, the memory range can have holes in it, which completely defeats binary search algorithm used in Linux implementation, which assumes the mapped/committed range is contiguous.
06-03-2018

My mistake. So this only fails on machines with lots of processors. If we don't have such machines executing the nightlies then ...
06-03-2018

The tests are not excluded from the nightly. They just don't fail.
06-03-2018

If the above tests are excluded from nightly testing then it seems likely a "regression test" that exercises the same area would also have to be excluded as well. Why are these tests excluded? Do they take too long or are they unreliable?
05-03-2018

If possible, please add a regression test for this failure. It was not found by the nightly testing.
05-03-2018