JDK-8295357 : Kitchensink8H.java failed with SIGSEGV in MallocTracker::record_free
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 20
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • OS: os_x
  • CPU: aarch64
  • Submitted: 2022-10-14
  • Updated: 2022-12-07
  • Resolved: 2022-12-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 20
20Resolved
Related Reports
Duplicate :  
Relates :  
Description
The following test failed in the JDK20 CI:

applications/kitchensink/Kitchensink8H.java

Here's a snippet from the log file:

[2022-10-14T09:54:35.607376Z] Gathering output for process 34285
[2022-10-14T09:54:37.996252Z] Waiting for completion for process 34285
[2022-10-14T09:54:37.997303Z] Waiting for completion finished for process 34285
Output and diagnostic info for process 34285 was saved into 'pid-34285-output.log'
[stress.process.out] #
[stress.process.out] # A fatal error has been detected by the Java Runtime Environment:
[stress.process.out] #
[stress.process.out] #  SIGSEGV (0xb) at pc=0x00000001031abcb0, pid=34273, tid=12547
[stress.process.out] #
[stress.process.out] # JRE version: Java(TM) SE Runtime Environment (20.0+19) (fastdebug build 20-ea+19-1363)
[stress.process.out] # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 20-ea+19-1363, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64)
[stress.process.out] # Problematic frame:
[stress.process.out] # V  [libjvm.dylib+0xc8fcb0]  MallocTracker::record_free(void*)+0x18c
[stress.process.out] #
[stress.process.out] # Core dump will be written. Default location: core.34273
[stress.process.out] #
[stress.process.out] # JFR recording file will be written. Location: /System/Volumes/Data/mesos/work_dir/slaves/0c72054a-24ab-4dbb-944f-97f9341a1b96-S80283/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/ffeefe78-25e7-4d47-aa22-3134a81690df/runs/28bd1877-bd61-42bc-a125-ae9ad34b0b6c/testoutput/test-support/jtreg_closed_test_hotspot_jtreg_applications_kitchensink_Kitchensink8H_java/scratch/0/hs_err_pid34273.jfr
[stress.process.out] #
[stress.process.out] Unsupported internal testing APIs have been used.
[stress.process.out] 
[stress.process.out] # An error report file with more information is saved as:
[stress.process.out] # /System/Volumes/Data/mesos/work_dir/slaves/0c72054a-24ab-4dbb-944f-97f9341a1b96-S80283/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/ffeefe78-25e7-4d47-aa22-3134a81690df/runs/28bd1877-bd61-42bc-a125-ae9ad34b0b6c/testoutput/test-support/jtreg_closed_test_hotspot_jtreg_applications_kitchensink_Kitchensink8H_java/scratch/0/hs_err_pid34273.log
[stress.process.out] #
[stress.process.out] # If you would like to submit a bug report, please visit:
[stress.process.out] #   https://bugreport.java.com/bugreport/crash.jsp
[stress.process.out] #
[2022-10-14T09:57:59.031653Z] Gathering output for process 34302
[2022-10-14T09:57:59.732424Z] Waiting for completion for process 34302
[2022-10-14T09:57:59.732540Z] Waiting for completion finished for process 34302
Output and diagnostic info for process 34302 was saved into 'pid-34302-output.log'


Here's the crashing thread's stack:

---------------  T H R E A D  ---------------

Current thread (0x0000000134006340):  WorkerThread "GC Thread#0" [stack: 0x000000016f820000,0x000000016fa23000] [id=12547]

Stack: [0x000000016f820000,0x000000016fa23000],  sp=0x000000016fa22cb0,  free space=2059k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.dylib+0xc8fcb0]  MallocTracker::record_free(void*)+0x18c
V  [libjvm.dylib+0xda3850]  os::free(void*)+0xf8
V  [libjvm.dylib+0x835488]  InstanceKlass::get_jmethod_id(methodHandle const&)+0x2e0
V  [libjvm.dylib+0xd004b4]  Method::jmethod_id()+0x74
V  [libjvm.dylib+0xd50574]  nmethod::unlink()+0x114
V  [libjvm.dylib+0xdc934c]  CodeCacheUnloadingTask::work(unsigned int)+0x70
V  [libjvm.dylib+0x71b4e4]  G1ParallelCleaningTask::work(unsigned int)+0x74
V  [libjvm.dylib+0x10e3808]  WorkerThread::run()+0x94
V  [libjvm.dylib+0xfe96b0]  Thread::call_run()+0x220
V  [libjvm.dylib+0xda88d0]  thread_native_entry(Thread*)+0x160
C  [libsystem_pthread.dylib+0x74ec]  _pthread_start+0x94


siginfo: si_signo: 11 (SIGSEGV), si_code: 2 (SEGV_ACCERR), si_addr: 0x0000979245543410


The crashing thread is WorkerThread "GC Thread#0", GC is doing
a CodeCacheUnloadingTask so we've called nmethod::unlink() and
the "interesting" crashing frames are:

V  [libjvm.dylib+0xc8fcb0]  MallocTracker::record_free(void*)+0x18c
V  [libjvm.dylib+0xda3850]  os::free(void*)+0xf8
V  [libjvm.dylib+0x835488]  InstanceKlass::get_jmethod_id(methodHandle const&)+0x2e0
V  [libjvm.dylib+0xd004b4]  Method::jmethod_id()+0x74

and these functions are often investigated by Runtime... So this bug
could start in GC, Compiler or Runtime for initial triage... choices...

I think I'll start this bug off in Runtime since the SIGSEGV is in
MallocTracker::record_free and maybe there's something that
should be hardened in that function to get more information when
passed bad info (rather than a SIGSEGV)...

Another interesting point is that I think we've only recently added
macosx-aarch64 machines to Tier8 so this might not be due to a
recent change...
Comments
From the stack trace, this looks obviously like a duplicate of JDK-8296955.
07-12-2022

I found some possibly related bugs in our realloc() wrapper. Would be nice to get eyes on the patch. https://github.com/openjdk/jdk/pull/10857
27-10-2022

There are two possibilities. 1) a memory overwriter. The NMT malloc header looks (on 64-bit) like this: `<64-bit size> <32-bit MST marker> <flags> ... <canary>` In the second crash, we know we just survived "MallocHeader::check_block_integrity". That means that the 16-bit canary is valid. It also means that the pointer points probably to a valid allocation (the chance that the 16-bit canary would be valid otherwise is small). But parts of the header before could have been overwritten. We assert when looking for the MST bucket associated with the allocation. That one is retrieved via, ultimately, the MST marker. If the 32-bit MST marker was overwritten with garbage, it could explain the error. 2) something wrong with the MST. But I think this is less likely than (1). I think we can make the malloc header more fault tolerant without enlarging it. Then it should be possible to catch cases where the start of the header was overwritten. Also, MallocHeader::check_block_integrity should include, in its integrity check, validity of the MST marker. Would be nice to see the malloc header (the 16 bytes preceding the pointer handed to os::free) for both crashes. Side Note, in the first crash, the second half of the crash address looks weirdly "ascii-ish": 0x0000979245543410.
24-10-2022

Here's a log file snippet from the jdk-20+21-1506-tier8 sighting: applications/kitchensink/Kitchensink.java ----------System.out:(27/3179)---------- /System/Volumes/Data/mesos/work_dir/jib-master/install/jdk-20+21-1506/src.full/closed/test/hotspot/jtreg/applications/kitchensink/applications/kitchensink/resources/kitchensink.default.properties /System/Volumes/Data/mesos/work_dir/jib-master/install/jdk-20+21-1506/src.full/closed/test/hotspot/jtreg/applications/kitchensink/resources/kitchensink.default.properties /System/Volumes/Data/mesos/work_dir/jib-master/install/jdk-20+21-1506/src.full/closed/test/hotspot/jtreg/applications/kitchensink/applications/kitchensink/resources/kitchensink.default.properties /System/Volumes/Data/mesos/work_dir/jib-master/install/jdk-20+21-1506/src.full/closed/test/hotspot/jtreg/applications/kitchensink/resources/kitchensink.default.properties [stress.process.out] For random generator using seed: -6618216326608271388 [stress.process.out] To re-run test with same seed value please add "-Djdk.test.lib.random.seed=-6618216326608271388" to command line. [stress.process.out] Stress process main method is started. [stress.process.out] # [stress.process.out] # A fatal error has been detected by the Java Runtime Environment: [stress.process.out] # [stress.process.out] # Internal Error (/System/Volumes/Data/mesos/work_dir/slaves/0c72054a-24ab-4dbb-944f-97f9341a1b96-S79657/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/eacfd17e-dcce-4179-a772-3df6057c495d/runs/9d016681-0080-4fba-a615-ad5eb7ea024d/workspace/open/src/hotspot/share/services/mallocSiteTable.cpp:174), pid=98325, tid=34051 [stress.process.out] # assert(head != __null) failed: Invalid position index [stress.process.out] # [stress.process.out] # JRE version: Java(TM) SE Runtime Environment (20.0+21) (fastdebug build 20-ea+21-1506) [stress.process.out] # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 20-ea+21-1506, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64) [stress.process.out] # Core dump will be written. Default location: core.98325 [stress.process.out] # [stress.process.out] # JFR recording file will be written. Location: /System/Volumes/Data/mesos/work_dir/slaves/0c72054a-24ab-4dbb-944f-97f9341a1b96-S80257/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/535e5f5e-3bf9-47f9-8816-c751bcc75b23/runs/dd7708e3-dc57-4d0e-88df-e33eb54f4e81/testoutput/test-support/jtreg_closed_test_hotspot_jtreg_applications_kitchensink_Kitchensink_java/scratch/0/hs_err_pid98325.jfr [stress.process.out] # [stress.process.out] Unsupported internal testing APIs have been used. [stress.process.out] [stress.process.out] # An error report file with more information is saved as: [stress.process.out] # /System/Volumes/Data/mesos/work_dir/slaves/0c72054a-24ab-4dbb-944f-97f9341a1b96-S80257/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/535e5f5e-3bf9-47f9-8816-c751bcc75b23/runs/dd7708e3-dc57-4d0e-88df-e33eb54f4e81/testoutput/test-support/jtreg_closed_test_hotspot_jtreg_applications_kitchensink_Kitchensink_java/scratch/0/hs_err_pid98325.log [stress.process.out] # [stress.process.out] # If you would like to submit a bug report, please visit: [stress.process.out] # https://bugreport.java.com/bugreport/crash.jsp [stress.process.out] # ----------System.err:(126/16836)---------- Here's the crashing thread's stack: --------------- T H R E A D --------------- Current thread (0x0000000121612940): WorkerThread "GC Thread#5" [stack: 0x000000017356c000,0x000000017376f000] [id=34051] Stack: [0x000000017356c000,0x000000017376f000], sp=0x000000017376ec10, free space=2059k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.dylib+0x1094d8c] VMError::report_and_die(int, char const*, char const*, char*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x5c8 (mallocSiteTable.cpp:174) V [libjvm.dylib+0x10954c8] VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, char*)+0x40 V [libjvm.dylib+0x56da60] report_vm_error(char const*, int, char const*, char const*, ...)+0x80 V [libjvm.dylib+0xc9224c] MallocSiteTable::malloc_site(unsigned int)+0xa4 V [libjvm.dylib+0xc93454] MallocTracker::record_free(void*)+0xd0 V [libjvm.dylib+0xda707c] os::free(void*)+0xf8 V [libjvm.dylib+0x83913c] InstanceKlass::get_jmethod_id(methodHandle const&)+0x2e0 V [libjvm.dylib+0xd03c70] Method::jmethod_id()+0x74 V [libjvm.dylib+0xd53d3c] nmethod::unlink()+0x114 V [libjvm.dylib+0xdccb94] CodeCacheUnloadingTask::work(unsigned int)+0x70 V [libjvm.dylib+0x720004] G1ParallelCleaningTask::work(unsigned int)+0x74 V [libjvm.dylib+0x10e7d4c] WorkerThread::run()+0x94 V [libjvm.dylib+0xfecec4] Thread::call_run()+0x220 V [libjvm.dylib+0xdac0fc] thread_native_entry(Thread*)+0x160 C [libsystem_pthread.dylib+0x726c] _pthread_start+0x94
24-10-2022

ILW = HLM = P3
18-10-2022