JDK-8241004 : NMT tests fail on unaligned thread size with debug build
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 15
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: other
  • CPU: x86
  • Submitted: 2020-03-13
  • Updated: 2022-02-15
  • Resolved: 2020-06-04
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 15
11.0.15Fixed 15 b26Fixed
Related Reports
Relates :  
Description
Steps to reproduce:
- Build openjdk from portola repository with --enable-debug option
- Run NMT test:
> jtreg -v:all test/hotspot/jtreg/runtime/NMT/PrintNMTStatistics.java

stdout:
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/os_linux.cpp:3394
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (root/mount/repos/branch-portola-dev/portola-dev/src/hotspot/os/linux/os_linux.cpp:3394), pid=5429, tid=5430
#  assert(is_aligned(size, page_sz)) failed: Size must be page aligned
#
# JRE version: OpenJDK Runtime Environment (15.0) (fastdebug build 15-internal+0-adhoc..portola-dev)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 15-internal+0-adhoc..portola-dev, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x13e8622]  os::committed_in_range(unsigned char*, unsigned long, unsigned char*&, unsigned long&)+0x532
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P" (or dumping to /root/mount/repos/branch-portola-dev/portola-dev/JTwork/scratch/core.5429)
#
# An error report file with more information is saved as:
# /root/mount/repos/branch-portola-dev/portola-dev/JTwork/scratch/hs_err_pid5429.log


The issue is reproduced when call:
> java -XX:+UnlockDiagnosticVMOptions -XX:+PrintNMTStatistics -XX:NativeMemoryTracking=detail 
-version 
Comments
A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk11u-dev/pull/812 Date: 2022-02-08 23:51:51 +0000
10-02-2022

Fix request [11u] This patch helps the effort to add macOS/AArch64 support to 11u by fixing hotspot runtime/NMT test failures on this platform. Small change, clean backport. Testing: affected tests and tier1.
09-02-2022

URL: https://hg.openjdk.java.net/jdk/client/rev/50d10091c645 User: psadhukhan Date: 2020-06-09 11:38:10 +0000
09-06-2020

I'm going to propose this change: http://cr.openjdk.java.net/~bulasevich/8241004/webrev.00
07-05-2020

As noted the specification is that the stack_size is the minimum allocated for the thread. If a system needs to add things to a threads stack e.g. guard pages or TLS storage areas then the requested stack size should be expanded to accommodate the additional system requirements so that the user gets the amount of usable stack that they requested. The fact glibc steals guard pages from the requested stack is a long standing bug. One would expect that any such addition would in fact be page aligned simply because its reasonable for the implementation to always alloate a whole number of pages. But there is no requirement for that to be the case, and with large pages it would arguably be a bad thing to do. Bottom line is that the assertion is overly strict and needs to be relaxed. However we need to know that NMT is not assuming aligned stacks when doing its own bookkeeping. We may find that NMT is not working as expected on musl.
01-04-2020

The root cause of the problem, is the fact that must library always add libc.tls_size bytes to the value of stack_size attribute and thus makes it not aligned. I would not agree with this behavior of must, but it doesn't violate the spec (see below), so I suggest to remove asserts that checks allignment of stack size. https://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_attr_setstacksize.html The stacksize attribute shall define the *minimum* stack size (in bytes) allocated for the created threads stack.
01-04-2020

There's a lot of detail missing from the stack trace in terms of how we get to os::committed_in_range from VirtualMemorySummary::snapshot, but presumably we are in this code: void VirtualMemoryTracker::snapshot_thread_stacks() { SnapshotThreadStackWalker walker; walk_virtual_memory(&walker); } It is not clear to me why we must be dealing with a page aligned size in this context, or whether this is actually constrained to the main thread created by the launcher - it seems to me that any/all stack sizes the VM requests will be modified by pthread_create as outlined. Bottom line I think this may be a NMT bug rather than a musl issue.
16-03-2020

The crash happens on the debug build because of the assertion in the os::committed_in_range() method: assert(is_aligned(size, page_sz), "Size must be page aligned"); https://hg.openjdk.java.net/portola/portola/file/7ff60204a181/src/hotspot/os/linux/os_linux.cpp#l3394 Some debug on my Alpine Linux from docker shows that running > java -XX:+UnlockDiagnosticVMOptions -XX:+PrintNMTStatistics -XX:NativeMemoryTracking=detail -version goes to CallJavaMainInNewThread() method: https://hg.openjdk.java.net/portola/portola/file/7ff60204a181/src/java.base/unix/native/libjli/java_md_solinux.c#l774 #0 CallJavaMainInNewThread (stack_size=1048576, args=0x7ffd13768b70) at /root/mount/repos/branch-portola-dev/portola-dev/src/java.base/unix/native/libjli/java_md_solinux.c:779 #1 0x00007f776e91d60d in ContinueInNewThread (ifn=ifn@entry=0x7ffd13768c90, threadStackSize=<optimized out>, argc=<optimized out>, argv=0x7f776e9c1d28, mode=mode@entry=0, what=what@entry=0x0, ret=0) at /root/mount/repos/branch-portola-dev/portola-dev/src/java.base/share/native/libjli/java.c:2361 #2 0x00007f776e920aed in JVMInit (ifn=ifn@entry=0x7ffd13768c90, threadStackSize=<optimized out>, argc=<optimized out>, argv=<optimized out>, mode=mode@entry=0, what=what@entry=0x0, ret=<optimized out>) at /root/mount/repos/branch-portola-dev/portola-dev/src/java.base/unix/native/libjli/java_md_solinux.c:826 #3 0x00007f776e91edfd in JLI_Launch (argc=<optimized out>, argv=<optimized out>, jargc=<optimized out>, jargv=<optimized out>, appclassc=0, appclassv=0x0, fullversion=0x55ec08b4f058 "15-internal+0-adhoc..portola-dev", dotversion=0x55ec08b4f04a "0.0", pname=0x55ec08b4f045 "java", lname=0x55ec08b4f03d "openjdk", javaargs=0 '\000', cpwildcard=1 '\001', javaw=0 '\000', ergo=0) at /root/mount/repos/branch-portola-dev/portola-dev/src/java.base/share/native/libjli/java.c:344 #4 0x000055ec08b4e213 in main () where it first calls: pthread_attr_setstacksize(&attr, stack_size); with stack_size equals to 0x100000 and then calls pthread_create(&tid, &attr, ThreadJavaMain, args) pthread_create() in musl libc library sets stack_size to 0x100AC0 https://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_create.c#n302 It calculates the stack size as : new->stack_size = stack - stack_limit = (tsd - libc.tls_size) - (map + guard) = (map + size - __pthread_tsd_size - libc.tls_size) - (map + guard) = size - __pthread_tsd_size - libc.tls_size - guard = 0x101000 - 0x400 - 0x140 - 0 = 0x100AC0
13-03-2020