JDK-8202772 : NMT thread stack tracking causes crashes on AIX
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 11
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: aix
  • Submitted: 2018-05-08
  • Updated: 2021-01-22
  • Resolved: 2018-06-13
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11
11 b18Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
On AIX, we see:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/priv/d031900/openjdk/jdk-jdk/source/src/hotspot/share/services/virtualMemoryTracker.cpp:516), pid=24641784, tid=5141
#  assert(committed_size > 0 && is_aligned(committed_size, os::vm_page_size())) failed: Must be
#
# JRE version: OpenJDK Runtime Environment (11.0) (fastdebug build 11-internal+0-adhoc.d031900.source)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 11-internal+0-adhoc.d031900.source, mixed mode, tiered, compressed oops, g1 gc, aix-ppc64)
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

Stack: [0x0000000117010000,0x000000011721d888],  sp=0x000000011721bcc0,  free space=2095k
No context given, using current context.
------ current frame:
iar:  0x0900000144fb9f18 libjvm.so::AixNativeCallstack::print_callstack_for_context(outputStream*,const ucontext_t*,bool,char*,unsigned long)+0x918  (C++ saves_cr saves_lr stores_bc gpr_saved:16 fixedparms:5 )
lr:   0x0000000000000000 (unknown module)::(unknown function)+?
sp:   0x000000011721a290 (base - 0x35F8)
rtoc: 0x09001000a059c518
|---stackaddr----|   |----lrsave------|:   <function name>
0x000000011721a9c0 - 0x0900000141c0d980 libjvm.so::os::platform_print_native_stack(outputStream*,void*,char*,int)+0x20  (C++ saves_lr stores_bc fixedparms:4 )
0x000000011721aa30 - 0x090000014234a6c4 libjvm.so::VMError::report(outputStream*,bool)+0x1a44  (C++ saves_lr stores_bc gpr_saved:12 fixedparms:2 )
0x000000011721bad0 - 0x090000014234d304 libjvm.so::VMError::report_and_die(int,const char*,const char*,char*,Thread*,unsigned char*,void*,void*,const char*,int,unsigned long)+0x1e4  (C++ saves_cr saves_lr stores_bc gpr_saved:16 fixedparms:8 )
0x000000011721bcd0 - 0x090000014234f168 libjvm.so::VMError::report_and_die(Thread*,void*,const char*,int,const char*,const char*,char*)+0x48  (C++ saves_lr stores_bc fixedparms:7 )
0x000000011721bd60 - 0x0900000141a0f4f0 libjvm.so::report_vm_error(const char*,int,const char*,const char*,...)+0xf0  (C++ saves_lr stores_bc gpr_saved:4 fixedparms:8 parmsonstk:1)
0x000000011721bdf0 - 0x0900000141e05400 libjvm.so::RegionIterator::next_committed(unsigned char*&,unsigned long&)+0x100  (C++ saves_lr stores_bc gpr_saved:7 fixedparms:3 )
0x000000011721bea0 - 0x0900000141e01504 libjvm.so::SnapshotThreadStackWalker::do_allocation_site(const ReservedMemoryRegion*)+0x104  (C++ saves_lr stores_bc gpr_saved:4 fixedparms:2 )
0x000000011721bfa0 - 0x0900000141dfb1d8 libjvm.so::VirtualMemorySummary::snapshot(VirtualMemorySnapshot*)+0x98  (C++ saves_lr stores_bc gpr_saved:3 fixedparms:1 )
0x000000011721c040 - 0x0900000141e0f1d8 libjvm.so::MemBaseline::baseline(bool)+0xa78  (C++ saves_lr stores_bc gpr_saved:11 fixedparms:2 )
0x000000011721c180 - 0x0900000144ea67a4 libjvm.so::NMTDCmd::execute(DCmdSource,Thread*)+0xbc4  (C++ saves_lr stores_bc gpr_saved:11 fixedparms:3 )
0x000000011721d120 - 0x0900000142372144 libjvm.so::DCmd::parse_and_execute(DCmdSource,outputStream*,const char*,char,Thread*)+0xae4  (C++ saves_cr saves_lr stores_bc gpr_saved:18 fixedparms:5 )

----

Reproduce with:

java -XX:NativeMemoryTracking=summary -XX:+PrintNMTStatistics

respectively

gtestLauncher -jdk:./images/jdk/ --gtest_filter=CommittedVirtualMemoryTracker.test_committed_virtualmemory_region_test_vm

both cases assert on AIX.

--------------

The problem is that NMT assumes stack boundaries to be page aligned. This is on most OSes the case, but does not necessarily have to be, and on AIX it is not. POSIX certainly does not require pthread stack boundaries to be page aligned. 

On AIX, stack boundaries are not aligned to page size. For the stack end, this does not matter: when retrieving the stack dimensions from the OS, we just align the stack boundary up to the next page size, where we will then place the thread stack guard pages. That is fine - the fact that the real pthread stack is actually a bit larger does not really matter much.

However, wrt the stack base the matter is different. The reported stack base is also not stack aligned, and we cannot simply act as if it were.


Comments
On AIX, we have two problems here: - For one, as explained above, NMT assumes stack boundaries to be page aligned. - Two, the way mincore() is used to read residency of pages needs to be adapted since on AIX, os::vm_page_size() is not necessarily the page size used by mincore() - which is quite dangerous. Since JDK-8204552 was added to deal with the first point, it makes sense to wait until that item is finished. Until then, I will disable thread stack recognition in NMT for AIX. We will revisit this topic once we have time and JDK-8204552 has been done.
12-06-2018

This may also be the intermittent problem that is linked above.
08-05-2018