JDK-8283651 : nsk/jvmti/SuspendThread/suspendthrd003 may leak native memory
  • Type: Bug
  • Component: hotspot
  • Sub-Component: jvmti
  • Affected Version: 19
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2022-03-24
  • Updated: 2022-05-10
  • Resolved: 2022-05-09
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 19
19 masterFixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Description
Copied from a note in JDK-8282704:

I've attached the latest RSS monitor results with the fix for this
bug (JDK-8282704) and JDK-8282314 in place on my M1 MacMini:

do_all_check_stress_results.jdk-19+13_redo_1.log

My analysis scripts no longer complain about possible leaks.

However, I do still see something suspicious in the max_value values
for StressWrapper_suspendthrd003. Here's all the max_value values
from the most recent run:

$ grep max_value !$
grep max_value do_all_check_stress_results.jdk-19+13_redo_1.log
max_value=72128
max_value=179728
max_value=178400
max_value=142480
max_value=218560
max_value=208368
max_value=43600
max_value=131280
max_value=171328
max_value=202512
max_value=209456
max_value=180880
max_value=65776
max_value=180688
max_value=178720
max_value=69104
max_value=175888
max_value=179760
max_value=63856
max_value=173008
max_value=177680
max_value=125360
max_value=166176
max_value=171936
max_value=157424
max_value=213184
max_value=216320
max_value=62688
max_value=170752
max_value=174416
max_value=142448
max_value=193328
max_value=186352
max_value=56096
max_value=166816
max_value=172960
max_value=568272
max_value=3120272
max_value=1692416

The last three values are for StressWrapper_suspendthrd003 and they
are much higher than any of the other StressWrappers... Since we don't
see an OOME anymore, I have to guess that this is possibly some
native memory leak.

Here's the full entries for StressWrapper_suspendthrd003:

StressWrapper_suspendthrd003 release:
n_values=240
max_value=568272
min_value=55584
max_increase=8496
min_increase=7488
Value did not increase 1 times.
Value decreased 67 times.
Value increased 171 times

StressWrapper_suspendthrd003 fastdebug:
n_values=240
max_value=3120272
min_value=179136
max_increase=354256
min_increase=44256
Value decreased 76 times.
Value increased 163 times

StressWrapper_suspendthrd003 slowdebug:
n_values=240
max_value=1692416
min_value=182048
max_increase=170768
min_increase=15872
Value decreased 73 times.
Value increased 166 times

I haven't used NMT for a long time so I'll likely investigate
this possible issue with another bug... 
Comments
Changeset: 61450bb0 Author: Daniel D. Daugherty <dcubed@openjdk.org> Date: 2022-05-09 22:03:55 +0000 URL: https://git.openjdk.java.net/jdk/commit/61450bb061ecda9700ddbd387a1f0659ebd1cced
09-05-2022

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk/pull/8609 Date: 2022-05-09 20:15:21 +0000
09-05-2022

The proposed fix looks like it eliminates the leak: $ wc -l do_mem_monitor.detail.log.save.00/*.total_committed.log && tail -1 do_mem_monitor.detail.log.save.00/*.total_committed.log 59 do_mem_monitor.detail.log.save.00/60201.total_committed.log 1231145KB $ wc -l do_mem_monitor.detail.log.save.01/*.total_committed.log && tail -1 do_mem_monitor.detail.log.save.01/*.total_committed.log 59 do_mem_monitor.detail.log.save.01/25576.total_committed.log 1239532KB $ wc -l do_mem_monitor.detail.log.save.02/*.total_committed.log && tail -1 do_mem_monitor.detail.log.save.02/*.total_committed.log 58 do_mem_monitor.detail.log.save.02/9870.total_committed.log 244752KB The first run reached a total committed size of 1.17GB. The second run reached a total commited size of 1.18GB. The third run (with the fix) reached a total commited size of 0.23GB so almost 1GB better.
07-05-2022

I'm going to experiment with this possible fix: $ git diff diff --git a/test/hotspot/jtreg/vmTestbase/nsk/share/jvmti/agent_tools.cpp b/test/hotspot/jtreg/vmTestbase/nsk/share/jvmti/agent_tools.cpp index 2412bd63040..b75f237c6a7 100644 --- a/test/hotspot/jtreg/vmTestbase/nsk/share/jvmti/agent_tools.cpp +++ b/test/hotspot/jtreg/vmTestbase/nsk/share/jvmti/agent_tools.cpp @@ -1,5 +1,5 @@ /* - * Copyright (c) 2003, 2021, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2003, 2022, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it @@ -450,6 +450,12 @@ jthread nsk_jvmti_threadByName(const char name[]) { if (info.name != NULL && strcmp(name, info.name) == 0) { foundThread = threads[i]; + } + if (!NSK_JVMTI_VERIFY(jvmti_env->Deallocate((unsigned char*)info.name))) { + nsk_jvmti_setFailStatus(); + return NULL; + } + if (foundThread != NULL) { break; } }
07-05-2022

test/hotspot/jtreg/vmTestbase/nsk/jvmti/SuspendThread/suspendthrd003/suspendthrd003.cpp static void JNICALL agentProc(jvmtiEnv* jvmti, JNIEnv* jni, void* arg) { <snip> NSK_DISPLAY1("Find thread: %s\n", THREAD_NAME); if (!NSK_VERIFY((testedThread = nsk_jvmti_threadByName(THREAD_NAME)) != NULL)) return; NSK_DISPLAY1(" ... found thread: %p\n", (void*)testedThread); so agentProc() in suspendthrd003.cpp calls nsk_jvmti_threadByName() which is mentioned by the NMT detail output in the previous comment. test/hotspot/jtreg/vmTestbase/nsk/share/jvmti/agent_tools.cpp /** Find alive thread by name. */ jthread nsk_jvmti_threadByName(const char name[]) { jthread* threads = NULL; jint count = 0; jthread foundThread = NULL; int i; if (!NSK_VERIFY(name != NULL)) { nsk_jvmti_setFailStatus(); return NULL; } if (!NSK_JVMTI_VERIFY(jvmti_env->GetAllThreads(&count, &threads))) { nsk_jvmti_setFailStatus(); return NULL; } for (i = 0; i < count; i++) { jvmtiThreadInfo info; if (!NSK_JVMTI_VERIFY(jvmti_env->GetThreadInfo(threads[i], &info))) { nsk_jvmti_setFailStatus(); break; } if (info.name != NULL && strcmp(name, info.name) == 0) { foundThread = threads[i]; break; } } so the local "jvmtiThreadInfo info" is inside the for-loop which calls JVM/TI GetThreadInfo() and the 'name' field is used to match that we've found the right thread. On a match, we break out of the loop and don't use the 'info' structure anymore. However, the JVM/TI spec says this about the info_ptr param: info_ptr jvmtiThreadInfo* On return, filled with information describing the specified thread. Agent passes a pointer to a jvmtiThreadInfo. On return, the jvmtiThreadInfo has been set. The pointer returned in the field name of jvmtiThreadInfo is a newly allocated array. The array should be freed with Deallocate. The object returned in the field thread_group of jvmtiThreadInfo is a JNI local reference and must be managed. The object returned in the field context_class_loader of jvmtiThreadInfo is a JNI local reference and must be managed. So the nsk_jvmti_threadByName() function is missing the code to cleanup the memory referred to by field 'name'.
06-05-2022

I used NMT "summary" observations to verify that memory still appears to be leaking in StressWrapper_suspendthrd003 with jdk-19+21 bits. Not an unexpected result since I didn't find anything that might have fixed the memory leak as a side effect. I did an hour run with NMT "detail" observations and one thing in the "Total" section sticks out for me: Total: reserved=2588485KB, committed=1231145KB malloc: 1011825KB #33787843 mmap: reserved=1576660KB, committed=219320KB <snip> - Internal (reserved=461839KB, committed=461839KB) (malloc=461807KB #33748161) (mmap: reserved=32KB, committed=32KB) That's 450MB of "Internal" memory... And the first thing in the "Details:" section also stands out: [0x0000000104de9efd] JvmtiEnvBase::jvmtiMalloc(long)+0x1d [0x0000000104de0882] JvmtiEnv::GetThreadInfo(_jobject*, jvmtiThreadInfo*)+0x832 [0x0000000104d8d33c] jvmti_GetThreadInfo(_jvmtiEnv*, _jobject*, jvmtiThreadInfo*)+0xec [0x0000000103128713] nsk_jvmti_threadByName+0x163 (malloc=461176KB type=Internal #33733611) I gotta take a look at the JVM/TI GetThreadInfo() spec and how the test is using JVM/TI GetThreadInfo...
06-05-2022

The debug code that I created for: JDK-8284632 runtime/Thread/StopAtExit.java possibly leaking memory again gave me some ideas so I added counter based debugging for the ThreadSelfSuspensionHandshake objects and ran StressWrapper_suspendthrd003 in {release, fastdebug, slowdebug} configs using a jdk-19+20 baseline: $ grep "vm_exit: dcubed_" build/macosx-x86_64-normal-server-*/test-support/jtreg_open_test_hotspot_jtreg_StressWrapper_suspendthrd003_java/StressWrapper_suspendthrd003.jtr build/macosx-x86_64-normal-server-fastdebug/test-support/jtreg_open_test_hotspot_jtreg_StressWrapper_suspendthrd003_java/StressWrapper_suspendthrd003.jtr:vm_exit: dcubed_self_suspend_alloc_count=270717041 build/macosx-x86_64-normal-server-fastdebug/test-support/jtreg_open_test_hotspot_jtreg_StressWrapper_suspendthrd003_java/StressWrapper_suspendthrd003.jtr:vm_exit: dcubed_self_suspend_release_count=270717041 build/macosx-x86_64-normal-server-release/test-support/jtreg_open_test_hotspot_jtreg_StressWrapper_suspendthrd003_java/StressWrapper_suspendthrd003.jtr:vm_exit: dcubed_self_suspend_alloc_count=628242287 build/macosx-x86_64-normal-server-release/test-support/jtreg_open_test_hotspot_jtreg_StressWrapper_suspendthrd003_java/StressWrapper_suspendthrd003.jtr:vm_exit: dcubed_self_suspend_release_count=628242287 build/macosx-x86_64-normal-server-slowdebug/test-support/jtreg_open_test_hotspot_jtreg_StressWrapper_suspendthrd003_java/StressWrapper_suspendthrd003.jtr:vm_exit: dcubed_self_suspend_alloc_count=34297247 build/macosx-x86_64-normal-server-slowdebug/test-support/jtreg_open_test_hotspot_jtreg_StressWrapper_suspendthrd003_java/StressWrapper_suspendthrd003.jtr:vm_exit: dcubed_self_suspend_release_count=34297247 These runs executed for ~101 minutes each and there's no sign of leakage of ThreadSelfSuspensionHandshake objects in any config.
05-05-2022