JDK-8226416 : MonitorUsedDeflationThreshold can cause repeated async deflation requests
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 13
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2019-06-19
  • Updated: 2021-01-21
  • Resolved: 2021-01-14
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17
17 b06Fixed
Related Reports
Blocks :  
Relates :  
Relates :  
Relates :  
Description
Vitaly Davidovich posted about this issue here:

https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-June/034706.html

with the subject line of "Increased safepoint cleanup tasks in Java 11 (vs 8)"

The '-XX:MonitorUsedDeflationThreshold=NN' experimental
option was introduced via JDK-8181859 in JDK10.

This option can inadvertently cause repeated "cleanup" safepoint
operations when ObjectMonitor usage is very low.

I have created a test program to demo this issue and I'm
also including logs from my JDK13 bits to illustrate the issue.
The algorithm of the test is simple:

  - recursively enter N monitors where each is inflated via a wait(1) call
  - once the N+1 call is made, delay for 10 seconds

With 'monitorinflation' logging enabled, check for the cleanup
safepoint logging messages.

Here are example runs from my macOSX MBP-13:

$ java -Xlog:monitorinflation=info MonitorUsedDeflationThresholdTest 20 \
  > MonitorUsedDeflationThresholdTest,info,count=20.log 2>&1

The above cmd will create 20 inflated monitors in the test program.
Using "monitorinflation=info" logging, here are the messages
generated at a safepoint:

$ grep 'gMonitorPopulation=' MonitorUsedDeflationThresholdTest,info,count=20.log
[0.110s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=96
[0.118s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=96
[0.205s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=32
[1.210s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=32
[10.215s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=64
[10.236s][info][monitorinflation] gMonitorPopulation=127 equals chkMonitorPopulation=127

A few things to notice:

- The gMonitorFreeCount value goes from 96 -> 32
- This is the log mesg for when the test has almost gotten to the
   end of the recursive enters:
     [0.205s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=32
- about a second later is another safepoint:
     [1.210s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=32
- and about 10 seconds after that is the last safepoint before
  we start shutting down the VM:
     [10.215s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=64


Here's another example run:

$ java -Xlog:monitorinflation=info MonitorUsedDeflationThresholdTest 33 \
  > MonitorUsedDeflationThresholdTest,info,count=33.log 2>&1

The above cmd will create 33 inflated monitors in the test program.
Using "monitorinflation=info" logging, here are the messages
generated at a safepoint:

$ grep 'gMonitorPopulation=' MonitorUsedDeflationThresholdTest,info,count=33.log
[0.111s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=96
[0.120s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=96
[0.207s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=32
[1.212s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=0
[2.217s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=0
[3.218s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=0
[4.219s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=0
[5.224s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=0
[6.229s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=0
[7.230s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=0
[8.236s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=0
[9.237s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=0
[10.237s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=64
[10.258s][info][monitorinflation] gMonitorPopulation=127 equals chkMonitorPopulation=127

A few things to notice:

- The gMonitorFreeCount value goes from 96 -> 32
- This is the log mesg for when the test has almost gotten to the
   end of the recursive enters:
     [0.207s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=32
- about a second later is another safepoint:
    [1.212s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=0
- at this point gMonitorFreeCount has reached 0 because all of
  the free global monitors have been allocated to per-thread
  lists.
- there will be a "cleanup" safepoint every second after this
  point where nothing gets freed and put back on the global
  free list:
    [9.237s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=0
- At the 10 second point, we've finished the recursive part of
  the test and unlocked all the inflated monitors:
     [10.237s][info][monitorinflation] gMonitorPopulation=127, gOmInUseCount=0, gMonitorFreeCount=64
   since the gMonitorFreeCount == 64, that means that the main
   thread (that ran the test) has exited and released all of its
   monitors to the free list.
Comments
Changeset: be57cf14 Author: Daniel D. Daugherty <dcubed@openjdk.org> Date: 2021-01-14 14:23:27 +0000 URL: https://git.openjdk.java.net/jdk/commit/be57cf14
14-01-2021

Here's a snippet of monitorinflation logging to demonstrate: [1.122s][info][monitorinflation ] begin deflating: in_use_list stats: ceiling=12, count=23, max=23 [1.123s][info][monitorinflation ] before handshaking: unlinked_count=1, in_use_list stats: ceiling=12, count=22, max=23 [1.123s][info][monitorinflation ] after handshaking: in_use_list stats: ceiling=12, count=22, max=23 [1.123s][info][monitorinflation ] deflated 1 monitors in 0.0000090 secs [1.123s][info][monitorinflation ] end deflating: in_use_list stats: ceiling=12, count=22, max=23 [2.123s][info][monitorinflation ] begin deflating: in_use_list stats: ceiling=12, count=22, max=23 [2.123s][info][monitorinflation ] end deflating: in_use_list stats: ceiling=12, count=22, max=23 [3.123s][info][monitorinflation ] begin deflating: in_use_list stats: ceiling=12, count=22, max=23 [3.123s][info][monitorinflation ] end deflating: in_use_list stats: ceiling=12, count=22, max=23 [4.123s][info][monitorinflation ] begin deflating: in_use_list stats: ceiling=12, count=22, max=23 [4.123s][info][monitorinflation ] end deflating: in_use_list stats: ceiling=12, count=22, max=23 [5.126s][info][monitorinflation ] Too many deflations without progress; bumping in_use_list_ceiling from 12 to 26 Done delaying for 10 secs. The first deflation at 1.122s deflates one monitor. The second, third and fourth deflation attempts at 2.123s, 3.123s and 4.123s all fail to deflation any monitors. The fifth deflation attempt at 5.126s detects that we've made too many deflation attempts and adjusts the in_use_list_ceiling from 12 to 26 and that stops further deflation attempts.
07-01-2021

Adding support for a diagnostic option called NoAsyncDeflationProgressMax with a default value of 3. If we have three async monitor deflation cycles in a row with zero monitors deflated, then we adjust the in_use_list_ceiling up. diff --git a/src/hotspot/share/runtime/globals.hpp b/src/hotspot/share/runtime/globals.hpp index ab324da78dd..d30fd882c86 100644 --- a/src/hotspot/share/runtime/globals.hpp +++ b/src/hotspot/share/runtime/globals.hpp @@ -749,6 +749,13 @@ const intx ObjectAlignmentInBytes = 8; "or AsyncDeflationInterval.") \ range(0, 100) \ \ + /* notice: the max range value here is max_jint, not max_intx */ \ + /* because of overflow issue */ \ + product(intx, NoAsyncDeflationProgressMax, 3, DIAGNOSTIC, \ + "Max number of no progress async deflation attempts to tolerate " \ + "before adjusting the in_use_list_ceiling up (0 is off).") \ + range(0, max_jint) \ And we change the logic in monitors_used_above_threshold() to detect the no progress condition and adjust the in_use_list_ceiling up.
07-01-2021

Nothing needs to be done with this RFE for JDK16. I need to poke around with my test program a bit, but I think the work in JDK-8253064 will have changed the behavior of this issue quite a bit.
17-11-2020

With the work done in the following bug: JDK-8253064 monitor list simplifications and getting rid of TSM A cleanup safepoint no longer causes async monitor deflation. Here's the bullet from the PR: - the safepoint cleanup phase no longer requests async monitor deflation; there is no longer a safepoint cleanup task for monitor deflation, but there is still an auditing/logging hook for debugging purposes. This issue of being able to cause repeated async deflation requests without making any progress still exists, but it no longer causes repeated cleanup safepoints.
12-11-2020

ILW = HLL = P4
27-06-2019

Here's another 20 inflation run with 'monitorinflation=debug' logging: $ java -Xlog:monitorinflation=debug MonitorUsedDeflationThresholdTest 20 > MonitorUsedDeflationThresholdTest,debug,count=20.log 2>&1 Because the 'debug' level output is formatted differently than the 'info' level output we have to grep for a different pattern: $ grep 'gMonitorFreeCount=' MonitorUsedDeflationThresholdTest,debug,count=20.log [0.112s][debug][monitorinflation] gMonitorFreeCount=96 equals chkMonitorFreeCount=96 [0.121s][debug][monitorinflation] gMonitorFreeCount=96 equals chkMonitorFreeCount=96 [0.206s][debug][monitorinflation] gMonitorFreeCount=32 equals chkMonitorFreeCount=32 [1.212s][debug][monitorinflation] gMonitorFreeCount=32 equals chkMonitorFreeCount=32 [10.214s][debug][monitorinflation] gMonitorFreeCount=64 equals chkMonitorFreeCount=64 [10.235s][debug][monitorinflation] gMonitorFreeCount=64 equals chkMonitorFreeCount=64 The debug level logs show the same pattern of one safepoint at 1.212s and the next safepoint at 10.214s. Here's another 33 inflation run with 'monitorinflation=debug' logging: $ java -Xlog:monitorinflation=debug MonitorUsedDeflationThresholdTest 33 > MonitorUsedDeflationThresholdTest,debug,count=33.log 2>&1 Because the 'debug' level output is formatted differently than the 'info' level output we have to grep for a different pattern: $ grep 'gMonitorFreeCount=' MonitorUsedDeflationThresholdTest,debug,count=33.log [0.112s][debug][monitorinflation] gMonitorFreeCount=65 equals chkMonitorFreeCount=65 [0.120s][debug][monitorinflation] gMonitorFreeCount=65 equals chkMonitorFreeCount=65 [0.207s][debug][monitorinflation] gMonitorFreeCount=1 equals chkMonitorFreeCount=1 [1.213s][debug][monitorinflation] gMonitorFreeCount=0 equals chkMonitorFreeCount=0 [2.219s][debug][monitorinflation] gMonitorFreeCount=0 equals chkMonitorFreeCount=0 [3.225s][debug][monitorinflation] gMonitorFreeCount=0 equals chkMonitorFreeCount=0 [4.230s][debug][monitorinflation] gMonitorFreeCount=0 equals chkMonitorFreeCount=0 [5.233s][debug][monitorinflation] gMonitorFreeCount=0 equals chkMonitorFreeCount=0 [6.240s][debug][monitorinflation] gMonitorFreeCount=0 equals chkMonitorFreeCount=0 [7.244s][debug][monitorinflation] gMonitorFreeCount=0 equals chkMonitorFreeCount=0 [8.249s][debug][monitorinflation] gMonitorFreeCount=0 equals chkMonitorFreeCount=0 [9.251s][debug][monitorinflation] gMonitorFreeCount=0 equals chkMonitorFreeCount=0 [10.239s][debug][monitorinflation] gMonitorFreeCount=33 equals chkMonitorFreeCount=33 [10.260s][debug][monitorinflation] gMonitorFreeCount=33 equals chkMonitorFreeCount=33 The debug level logs show the same pattern of one safepoint every second starting at 1.213s and ending at 10.239s.
19-06-2019