JDK-8282704 : runtime/Thread/StopAtExit.java may leak memory
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 19
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2022-03-05
  • Updated: 2022-10-07
  • Resolved: 2022-03-19
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 19
19 b15Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
In my recent stress testing of jdk-19 bits, I've run into
StressWrapper_StopAtExit possibly leaking memory.

I added an RSS monitor for the StressWrappers to my stress
kit setup and captured RSS data in my jdk-19+10 stress testing.
I wrote a script to extract the RSS data and another script to do
some initial analysis on that data.

In do_all_check_stress_results.jdk-19+10_1.log:

StressWrapper_StopAtExit release:
n_values=101
max_value=13102752
min_value=80516
max_increase=542264
min_increase=780
Value increased 100 times
WARNING: there might be a leak.

StressWrapper_suspendthrd003 release:
n_values=102
max_value=260024
min_value=43676
max_increase=17628
min_increase=8436
Value increased 101 times
WARNING: there might be a leak.

StressWrapper_suspendthrd003 fastdebug:
n_values=101
max_value=431132
min_value=193884
max_increase=18416
min_increase=6908
Value increased 100 times
WARNING: there might be a leak.

StressWrapper_suspendthrd003 slowdebug:
n_values=74
max_value=1019140
min_value=203728
max_increase=29192
min_increase=13156
Value increased 73 times
WARNING: there might be a leak.


In do_all_check_stress_results.jdk-19+10_2.log:

StressWrapper_suspendthrd003 release:
n_values=102
max_value=326184
min_value=45896
max_increase=18260
min_increase=11096
Value increased 101 times
WARNING: there might be a leak.

StressWrapper_suspendthrd003 fastdebug:
n_values=102
max_value=478212
min_value=193176
max_increase=18756
min_increase=5872
Value increased 101 times
WARNING: there might be a leak.

StressWrapper_suspendthrd003 slowdebug:
n_values=74
max_value=1029096
min_value=198076
max_increase=27360
min_increase=13056
Value increased 73 times
WARNING: there might be a leak.


do_all_check_stress_results.jdk-19+10_3.log:

StressWrapper_StopAtExit release:
n_values=101
max_value=13230152
min_value=78128
max_increase=532040
min_increase=1764
Value increased 100 times
WARNING: there might be a leak.


StressWrapper_suspendthrd003 release:
n_values=102
max_value=173640
min_value=42908
max_increase=10428
min_increase=10428
Value increased 101 times
WARNING: there might be a leak.

StressWrapper_suspendthrd003 fastdebug:
n_values=101
max_value=479636
min_value=193480
max_increase=18764
min_increase=5864
Value increased 100 times
WARNING: there might be a leak.

StressWrapper_suspendthrd003 slowdebug:
n_values=75
max_value=1032804
min_value=204916
max_increase=29068
min_increase=12572
Value increased 74 times
WARNING: there might be a leak.

The StressWrapper_suspendthrd003 leak is being investigated via:

    JDK-8282314 nsk/jvmti/SuspendThread/suspendthrd003 may leak memory

StressWrapper_StopAtExit did report a possible leak in
the 'release' bits config in 2 of the 3 runs so I'm not as
sure that is a real sighting. However, I need to investigate
so I've filed this bug for that purpose.
Comments
Changeset: 3f923b82 Author: Daniel D. Daugherty <dcubed@openjdk.org> Date: 2022-03-19 13:43:06 +0000 URL: https://git.openjdk.java.net/jdk/commit/3f923b82c31325504430b50dee262fd460004e7b
19-03-2022

I've attached the latest RSS monitor results with the fix for this bug (JDK-8282704) and JDK-8282314 in place on my M1 MacMini: do_all_check_stress_results.jdk-19+13_redo_1.log My analysis scripts no longer complain about possible leaks. However, I do still see something suspicious in the max_value values for StressWrapper_suspendthrd003. Here's all the max_value values from the most recent run: $ grep max_value !$ grep max_value do_all_check_stress_results.jdk-19+13_redo_1.log max_value=72128 max_value=179728 max_value=178400 max_value=142480 max_value=218560 max_value=208368 max_value=43600 max_value=131280 max_value=171328 max_value=202512 max_value=209456 max_value=180880 max_value=65776 max_value=180688 max_value=178720 max_value=69104 max_value=175888 max_value=179760 max_value=63856 max_value=173008 max_value=177680 max_value=125360 max_value=166176 max_value=171936 max_value=157424 max_value=213184 max_value=216320 max_value=62688 max_value=170752 max_value=174416 max_value=142448 max_value=193328 max_value=186352 max_value=56096 max_value=166816 max_value=172960 max_value=568272 max_value=3120272 max_value=1692416 The last three values are for StressWrapper_suspendthrd003 and they are much higher than any of the other StressWrappers... Since we don't see an OOME anymore, I have to guess that this is possibly some native memory leak. Here's the full entries for StressWrapper_suspendthrd003: StressWrapper_suspendthrd003 release: n_values=240 max_value=568272 min_value=55584 max_increase=8496 min_increase=7488 Value did not increase 1 times. Value decreased 67 times. Value increased 171 times StressWrapper_suspendthrd003 fastdebug: n_values=240 max_value=3120272 min_value=179136 max_increase=354256 min_increase=44256 Value decreased 76 times. Value increased 163 times StressWrapper_suspendthrd003 slowdebug: n_values=240 max_value=1692416 min_value=182048 max_increase=170768 min_increase=15872 Value decreased 73 times. Value increased 166 times I haven't used NMT for a long time so I'll likely investigate this possible issue with another bug...
17-03-2022

@dholmes-ora and [~alanb] - I would still prefer to fix this test. I maintain my stress kit for many releases so having an approved and fixed version of this test would be good to have. Not just until you (David) fix this bug in JDK19: JDK-8282952 but for potential backport to earlier releases also. And for earlier versions of my stress kit, the fixed test will be included in the stress kit itself until (and if) the fix is backported. Thanks! If Loom does integrate in JDK19 and Thread.stop() is degraded to throw UOE, then this test will be removed and I'll create a JVM/TI version of the test for the stress kit.
17-03-2022

In addition to David's comment, there are many changes in the loom repo that would require re-writing this test: The private Thread::threadTerminated method is removed, the book keeping associated with thread groups is removed, and several of the terminally deprecated methods in ThreadGroup have even degraded. Furthermore, I hope to degrade Thread.stop in an upcoming release so that it throws UOE unconditionally.
17-03-2022

Fixing JDK-8282952 will also fix this test issue.
17-03-2022

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk/pull/7819 Date: 2022-03-15 14:56:29 +0000
15-03-2022

$ unzip -l jdk-19+13_linux.8282704.zip Archive: jdk-19+13_linux.8282704.zip Length Date Time Name --------- ---------- ----- ---- 83418 2022-03-11 05:35 jdk-19+13_1/failures.linux-x86_64/StressWrapper_StopAtExit.jtr.release 102988 2022-03-12 05:10 jdk-19+13_2/failures.linux-x86_64/StressWrapper_StopAtExit.jtr.release 107003 2022-03-13 05:06 jdk-19+13_3/failures.linux-x86_64/StressWrapper_StopAtExit.jtr.release --------- ------- 293409 3 files
13-03-2022

I was able to reproduce the failure on my MBP13, but not with my current default of 6093 seconds/101 minutes. I bumped the value to 12000 seconds/200 minutes and saw an OOME after 152 minutes: Test Config: macosx-x86_64-normal-server-release INFO: TIMEOUT_FACTOR=4 Done testing Test Run macosx-x86_64-normal-server-release time: 152.26 minutes. TEST TOTAL PASS FAIL ERROR jtreg:open/test/hotspot/jtreg/StressWrapper_StopAtExit.java >> 1 0 1 0 << 1 failure(s) found in log=do_java_test.macosx-x86_64-normal-server-release.log TEST: StressWrapper_StopAtExit.java LOG: build/macosx-x86_64-normal-server-release/test-support/jtreg_open_test_hotspot_jtreg_StressWrapper_StopAtExit_java/StressWrapper_StopAtExit.jtr Saving build/macosx-x86_64-normal-server-release/test-support/jtreg_open_test_hotspot_jtreg_StressWrapper_StopAtExit_java/StressWrapper_StopAtExit.jtr as /work/shared/bug_hunt/8282704_for_jdk19.git/test_failures.2022-03-09-111835/StressWrapper_StopAtExit.jtr.release The fastdebug config did not have an OOME in 200 minutes and slowdebug just started. Update: the slowdebug config did not have an OOME in 200 minutes either.
10-03-2022

Not sure that is possible Dan. Any ThreadGroup you create is referenced by its parent, and the problem is that the threads are not removing themselves as they should, and you can't manually do that.
09-03-2022

Also InterruptAtExit can't suffer from the same problem.
09-03-2022

So the problem is that the code path that we're stressing is racy with respect to ThreadGroups... However, we don't really care about the ThreadGroup for the purposes of this test. I'm going to verify that I can repro the OOM on my MBP13 and then, I'm going to checkout using a throw away ThreadGroup that I can cleanup myself...
08-03-2022

Just to fill in the details a little bit more: src/hotspot/share/runtime/thread.cpp: void JavaThread::exit(bool destroy_vm, ExitType exit_type) { <snip> // Call Thread.exit(). We try 3 times in case we got another Thread.stop during // the execution of the method. If that is not enough, then we don't really care. Thread.stop // is deprecated anyhow. if (!is_Compiler_thread()) { int count = 3; while (java_lang_Thread::threadGroup(threadObj()) != NULL && (count-- > 0)) { EXCEPTION_MARK; JavaValue result(T_VOID); Klass* thread_klass = vmClasses::Thread_klass(); JavaCalls::call_virtual(&result, threadObj, thread_klass, vmSymbols::exit_method_name(), vmSymbols::void_method_signature(), THREAD); CLEAR_PENDING_EXCEPTION; } } src/java.base/share/classes/java/lang/Thread.java: private void exit() { <snip> if (group != null) { => group.threadTerminated(this); group = null; } src/java.base/share/classes/java/lang/ThreadGroup.java: void threadTerminated(Thread t) { synchronized (this) { => remove(t); if (nthreads == 0) { notifyAll(); } if (daemon && (nthreads == 0) && (nUnstartedThreads == 0) && (ngroups == 0)) { destroy(); } } So if our stressful calls to "thread.stop()" land in this code path, it possible that the ThreadGroup.remove() call won't happen...
08-03-2022

[~dholmes] - We're on the same page since I had the same idea while I was poking around the code. I may not be able to do anything about this particular leak. That probably also explains with InterruptAtExit sometimes appears to have a leak also. Stress tests can be a pain to get right... sigh...
08-03-2022

> Why are all the StopAtExit threads still present in the MainWrapper's MainThreadGroup? Presumably because the async exception disrupts execution of the code that would remove them. Update: note that we retry calling Thread.exit three times but we also hit the thread with stop() requests in a tight loop.
08-03-2022

VisualVM 2.1.2 to the rescue! Added screen shots to show where VisualVM says the memory is pinned. Why are all the StopAtExit threads still present in the MainWrapper's MainThreadGroup?
07-03-2022

$ unzip -l jdk-19+12_linux.8282704.zip Archive: jdk-19+12_linux.8282704.zip Length Date Time Name --------- ---------- ----- ---- 91348 2022-03-04 05:58 jdk-19+12_1/failures.linux-x86_64/StressWrapper_StopAtExit.jtr.release 106919 2022-03-05 05:22 jdk-19+12_2/failures.linux-x86_64/StressWrapper_StopAtExit.jtr.release 110944 2022-03-06 03:37 jdk-19+12_3/failures.linux-x86_64/StressWrapper_StopAtExit.jtr.release --------- ------- 309211 3 files
06-03-2022