JDK-8282314 : nsk/jvmti/SuspendThread/suspendthrd003 may leak memory
  • Type: Bug
  • Component: hotspot
  • Sub-Component: jvmti
  • Affected Version: 19
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2022-02-23
  • Updated: 2022-03-24
  • Resolved: 2022-03-10
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 19
19 b14Fixed
Related Reports
Relates :  
Description
In my recent stress testing of jdk-19 bits, I've run into
StressWrapper_suspendthrd003 failing in slowdebug configs
due to "OutOfMemoryError: Java heap space". The first
occurrence was in my jdk-19+8 stress testing and the failures
reproduced in jdk-19+9 and jdk-19+10 stress testing.

I added an RSS monitor for the StressWrappers to my stress
kit setup and captured RSS data in my jdk-19+10 stress testing.
I wrote a script to extract the RSS data and another script to do
some initial analysis on that data.

In do_all_check_stress_results.jdk-19+10_1.log:

StressWrapper_StopAtExit release:
n_values=101
max_value=13102752
min_value=80516
max_increase=542264
min_increase=780
Value increased 100 times
WARNING: there might be a leak.

StressWrapper_suspendthrd003 release:
n_values=102
max_value=260024
min_value=43676
max_increase=17628
min_increase=8436
Value increased 101 times
WARNING: there might be a leak.

StressWrapper_suspendthrd003 fastdebug:
n_values=101
max_value=431132
min_value=193884
max_increase=18416
min_increase=6908
Value increased 100 times
WARNING: there might be a leak.

StressWrapper_suspendthrd003 slowdebug:
n_values=74
max_value=1019140
min_value=203728
max_increase=29192
min_increase=13156
Value increased 73 times
WARNING: there might be a leak.


In do_all_check_stress_results.jdk-19+10_2.log:

StressWrapper_suspendthrd003 release:
n_values=102
max_value=326184
min_value=45896
max_increase=18260
min_increase=11096
Value increased 101 times
WARNING: there might be a leak.

StressWrapper_suspendthrd003 fastdebug:
n_values=102
max_value=478212
min_value=193176
max_increase=18756
min_increase=5872
Value increased 101 times
WARNING: there might be a leak.

StressWrapper_suspendthrd003 slowdebug:
n_values=74
max_value=1029096
min_value=198076
max_increase=27360
min_increase=13056
Value increased 73 times
WARNING: there might be a leak.


do_all_check_stress_results.jdk-19+10_3.log:

StressWrapper_StopAtExit release:
n_values=101
max_value=13230152
min_value=78128
max_increase=532040
min_increase=1764
Value increased 100 times
WARNING: there might be a leak.


StressWrapper_suspendthrd003 release:
n_values=102
max_value=173640
min_value=42908
max_increase=10428
min_increase=10428
Value increased 101 times
WARNING: there might be a leak.

StressWrapper_suspendthrd003 fastdebug:
n_values=101
max_value=479636
min_value=193480
max_increase=18764
min_increase=5864
Value increased 100 times
WARNING: there might be a leak.

StressWrapper_suspendthrd003 slowdebug:
n_values=75
max_value=1032804
min_value=204916
max_increase=29068
min_increase=12572
Value increased 74 times
WARNING: there might be a leak.

Since the possible leak reproduced in all three configs
in all three runs of StressWrapper_suspendthrd003, I'm
going to focus on that wrapper first.

StressWrapper_StopAtExit did report a possible leak in
the 'release' bits config in 2 of the 3 runs so I'm not as
sure that is a real sighting.
Comments
Changeset: bb7ee5a0 Author: Daniel D. Daugherty <dcubed@openjdk.org> Date: 2022-03-10 19:13:55 +0000 URL: https://git.openjdk.java.net/jdk/commit/bb7ee5a04ae21a9f9dc6c59a990f7e571e832f0d
10-03-2022

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk/pull/7764 Date: 2022-03-09 20:41:37 +0000
09-03-2022

So the nsk/jvmti/SuspendThread/suspendthrd003.java test generates four logging lines per loop: $ grep 'log.display' test/hotspot/jtreg/vmTestbase/nsk/jvmti/SuspendThread/suspendthrd003.java log.display("Starting tested thread"); log.display("Sync: thread started"); log.display("Finishing tested thread"); log.display("Sync: thread finished"); In a recent run of the fixed bits on my MBP13, I saw these counts: macosx-x86_64-normal-server-fastdebug: Executed 1672356 loops in 6093 seconds. macosx-x86_64-normal-server-release: Executed 260780 loops in 6093 seconds. macosx-x86_64-normal-server-slowdebug: Executed 7153680 loops in 6093 seconds. It's very interesting that we get fewer loops in 'release' bits and more loops in 'fastdebug' bits and even more in 'slowdebug' bits. With four messages per loop, in theory we would run into the 8,192,000 ceiling with 2048000 loops. I'm going to have to guess that without the fix in place, we're able to run fewer loops of the test across all three configs which is why I only see the OOME in the slowdebug config. I just checked my Ubuntu 20.04 server and don't save the logs for the passing runs so I don't have loop data at my finger tips for the case where the fix is not in place (jdk-19+11 and earlier). My jdk-19+12 stress testing included the fix so here are those loop counts: linux-x86_64-normal-server-fastdebug: Executed 663035 loops in 6093 seconds. linux-x86_64-normal-server-release: Executed 1877078 loops in 6093 seconds. linux-x86_64-normal-server-slowdebug: Executed 2523107 loops in 6093 seconds. And that data does tell the tale! Only the slowdebug config is above 2048000 even with the fix in place. You might be wondering why my MBP13 was able to execute more loop per config... I was only running the one stress test on the machine. On the Ubuntu 20.04 machine, I'm running many tests in parallel so the StressWrapper_suspendthrd003 runs do not have the machine to themselves like on the MBP13. The fix that I'm testing provides a new method to clear the log at the top of each loop. For the suspendthrd003 test, only the log messages for the current loop are relevant if the current loop fails.
08-03-2022

VisualVM 2.1.2 to the rescue! java_pid42669.prof.jpg Shows that we threw an OutOfMemoryError in the MainThread with a heapsize of 67,170,102 bytes which is interesting since we run this stress test with '-Xmx128m'. We have a java.lang.Object[] array that contains 8,192,000 elements and occupies 65,536,024 bytes so that's the first thing to look at. java_pid42669.prof.log_details.jpg Shows that the 'log' variable in suspendthrd003 is pinning down a Vector logBuffer that contains 8,192,000 elements. Ouch!
01-03-2022

Modified StressWrapper_suspendthrd003.java like so: $ git diff diff --git a/test/hotspot/jtreg/StressWrapper_suspendthrd003.java b/test/hotspot/jtreg/StressWrapper_suspendthrd003.java index 1c6262ed876..f3cbd3b53cd 100644 --- a/test/hotspot/jtreg/StressWrapper_suspendthrd003.java +++ b/test/hotspot/jtreg/StressWrapper_suspendthrd003.java @@ -5,6 +5,7 @@ * @library /vmTestbase * /test/lib * @run main/othervm/native/timeout=6393 + * -XX:+HeapDumpOnOutOfMemoryError * -Xmx128m * -agentlib:suspendthrd003=-waittime=5 * nsk.jvmti.SuspendThread.suspendthrd003 6093 and was able to reproduce this failure on my MBP13: $ do_java_test -c slowdebug hotspot/jtreg/StressWrapper_suspendthrd003.java INFO: GNUMAKE=make INFO: GNUMAKE version is: GNU Make 3.81 INFO: JTREG options: INFO: JOBS=4 INFO: TEST_MODE=agentvm INFO: EXTRA_PROBLEM_LISTS=ProblemList-extra.txt INFO: VM_OPTIONS= INFO: test_val=hotspot/jtreg/StressWrapper_suspendthrd003.java Test Config: macosx-x86_64-normal-server-slowdebug INFO: TIMEOUT_FACTOR=12 Done testing Test Run macosx-x86_64-normal-server-slowdebug time: 33.96 minutes. TEST TOTAL PASS FAIL ERROR jtreg:open/test/hotspot/jtreg/StressWrapper_suspendthrd003.java >> 1 0 1 0 << 1 failure(s) found in log=do_java_test.macosx-x86_64-normal-server-slowdebug.log TEST: StressWrapper_suspendthrd003.java LOG: build/macosx-x86_64-normal-server-slowdebug/test-support/jtreg_open_test_hotspot_jtreg_StressWrapper_suspendthrd003_java/StressWrapper_suspendthrd003.jtr Saving build/macosx-x86_64-normal-server-slowdebug/test-support/jtreg_open_test_hotspot_jtreg_StressWrapper_suspendthrd003_java/StressWrapper_suspendthrd003.jtr as /work/shared/bug_hunt/8282314_for_jdk19.git/test_failures.2022-02-28-125232/StressWrapper_suspendthrd003.jtr.slowdebug Total test time: 33.96 minutes. and it looks like I have a heap dump: $ ls -l build/macosx-x86_64-normal-server-slowdebug/test-support/jtreg_open_test_hotspot_jtreg_StressWrapper_suspendthrd003_java/StressWrapper_suspendthrd003 total 172240 -rw-r--r-- 1 dcubed green 2354917 Feb 28 13:26 environment.html -rw-r--r-- 1 dcubed green 195 Feb 28 13:26 environment.log -rw------- 1 dcubed green 69334569 Feb 28 13:25 java_pid42669.hprof -rw-r--r--@ 1 dcubed green 94518 Feb 28 13:26 screen1.png That's a pretty big heapdump file: 69334569 => 66.12MB
28-02-2022

$ unzip -l jdk-19+11_linux.8282314.zip Archive: jdk-19+11_linux.8282314.zip Length Date Time Name --------- ---------- ----- ---- 19092 2022-02-25 11:13 jdk-19+11_1/failures.linux-x86_64/StressWrapper_suspendthrd003.jtr.slowdebug 19094 2022-02-27 09:29 jdk-19+11_3/failures.linux-x86_64/StressWrapper_suspendthrd003.jtr.slowdebug --------- ------- 38186 2 files
27-02-2022

Since StressWrapper_StopAtExit always has a higher RSS than StressWrapper_suspendthrd003, I'm thinking that StopAtExit is leaking native memory and suspendthrd003 is leaking Java heap memory, but that's just a guess at this stage.
23-02-2022

$ sed -n -e ':top' -e '/^StressWrapper/{' -e '/ slowdebug/{' -e 'p' -e ':mid' -e 'n' -e '/^max_value=/{' -e 'p' -e 'n' -e 'b top' -e '}' -e 'b mid' -e '}' -e '}' do_all_check_stress_results.jdk-19+10_1.log StressWrapper_find006 slowdebug: max_value=197972 StressWrapper_GetLockOwnerName slowdebug: max_value=203628 StressWrapper_InterruptAtExit slowdebug: max_value=188576 StressWrapper_intrpthrd003 slowdebug: max_value=203224 StressWrapper_IsInterruptedAtExit slowdebug: max_value=195904 StressWrapper_issuspended002 slowdebug: max_value=202992 StressWrapper_objmonusage006 slowdebug: max_value=198128 StressWrapper_popframe011 slowdebug: max_value=196992 StressWrapper_SetNameAtExit slowdebug: max_value=234668 StressWrapper_SetPriorityAtExit slowdebug: max_value=194860 StressWrapper_StopAtExit slowdebug: max_value=13939132 StressWrapper_SuspendAtExit slowdebug: max_value=196756 StressWrapper_suspendthrd003 slowdebug: max_value=1019140 With 'slowdebug' bits, StressWrapper_StopAtExit hits the highest with 13939132 and StressWrapper_suspendthrd003 has the second highest with 1019140.
23-02-2022

$ sed -n -e ':top' -e '/^StressWrapper/{' -e '/ fastdebug/{' -e 'p' -e ':mid' -e 'n' -e '/^max_value=/{' -e 'p' -e 'n' -e 'b top' -e '}' -e 'b mid' -e '}' -e '}' do_all_check_stress_results.jdk-19+10_1.log StressWrapper_find006 fastdebug: max_value=195808 StressWrapper_GetLockOwnerName fastdebug: max_value=203376 StressWrapper_InterruptAtExit fastdebug: max_value=189408 StressWrapper_intrpthrd003 fastdebug: max_value=213756 StressWrapper_IsInterruptedAtExit fastdebug: max_value=188464 StressWrapper_issuspended002 fastdebug: max_value=197780 StressWrapper_objmonusage006 fastdebug: max_value=198452 StressWrapper_popframe011 fastdebug: max_value=194080 StressWrapper_SetNameAtExit fastdebug: max_value=232340 StressWrapper_SetPriorityAtExit fastdebug: max_value=194724 StressWrapper_StopAtExit fastdebug: max_value=11527072 StressWrapper_SuspendAtExit fastdebug: max_value=193624 StressWrapper_suspendthrd003 fastdebug: max_value=431132 With 'fastdebug' bits, StressWrapper_StopAtExit hits the highest with 11527072 and StressWrapper_suspendthrd003 has the second highest with 431132.
23-02-2022

Just looking at the max RSS value for each of the wrappers is interesting: $ sed -n -e ':top' -e '/^StressWrapper/{' -e '/ release/{' -e 'p' -e ':mid' -e 'n' -e '/^max_value=/{' -e 'p' -e 'n' -e 'b top' -e '}' -e 'b mid' -e '}' -e '}' do_all_check_stress_results.jdk-19+10_1.log StressWrapper_find006 release: max_value=58088 StressWrapper_GetLockOwnerName release: max_value=123596 StressWrapper_InterruptAtExit release: max_value=55556 StressWrapper_intrpthrd003 release: max_value=138244 StressWrapper_IsInterruptedAtExit release: max_value=53364 StressWrapper_issuspended002 release: max_value=56576 StressWrapper_objmonusage006 release: max_value=52204 StressWrapper_popframe011 release: max_value=124988 StressWrapper_SetNameAtExit release: max_value=131408 StressWrapper_SetPriorityAtExit release: max_value=59832 StressWrapper_StopAtExit release: max_value=13102752 StressWrapper_SuspendAtExit release: max_value=48396 StressWrapper_suspendthrd003 release: max_value=260024 With 'release' bits, StressWrapper_StopAtExit hits the highest with 13102752 and StressWrapper_suspendthrd003 has the second highest with 260024.
23-02-2022