JDK-8305994 : Guarantee eventual async monitor deflation
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 17,20,21
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2023-04-14
  • Updated: 2024-04-10
  • Resolved: 2023-04-24
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17 JDK 20 JDK 21
17.0.10-oracleFixed 20.0.2Fixed 21 b20Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
One of our systems reported a steady increase in memory usage after migration from JDK 11 to JDK 17. NMT logs clearly show the growing "Object Monitors" section, where the population of monitors is nearly 11M, taking several GBs of RSS.

Async monitor deflation is supposed to deal with this, by triggering the cleanup when `MonitorUsedDeflationThreshold` (`MUDT`) is reached. But the apparent problem with that heuristics is that `MUDT` is the percent of "ceiling", which is derived roughly as `max(#threads*AvgMonitorsPerThreadEstimate, #max_monitors_ever_observed)`, plus additive adjustments when deflation does not make progress. For the systems that run thousands of threads, we can have a very high ceiling. 

Also, AFAIU, the ceiling can get arbitrarily high, if we had historical spike in the number of monitors, or we did some past async deflations without a progress. The ceiling seems to never go down! (Which is a good thing in buildings, but not in this heuristics code.) So even if we set `MUDT` to lowest value, 1, the ceiling might get so large eventually, the heuristics would never fire after some point.

Back-envelope calculation: even without involving the historical ceiling adjustments, just the static calculation for the system with 13K threads (real-life number) and default `AMPTE` = 1024, yields the ceiling of about 12M. Which means the default `MUDT` = 90 would not trigger cleanup until we have at least 11M monitors, which at ~200 bytes per monitor translates to >2 GB of native memory.

This started to be a problem in JDK 17, because the work done in JDK 15..16 (JDK-8153224, JDK-8246476) gradually removed the path that did the monitor deflation on safepoint cleanups. So the JDK 11 applications got their monitors cleanup with the eventual safepoints from e.g. GC, *and* from the special cleanup safepoints triggered by monitor used thresholds, checked every `GuaranteedSafepointInterval`. But for JDK 17, the deflation is now only triggered by monitor used threshold checked every `AsyncDeflationInterval`, and "used threshold" might not be reached for quite some time. The worst case would be threads spiking to use all these monitors, then never using them again, and never using new ones, so the used threshold is never reached, and monitors stay inflated forever.

I have a minimal example showing this behavior:

```
import java.util.concurrent.CountDownLatch;

public class Monitors {
   static final int THREAD_COUNT  = Integer.getInteger("threads", 2000);
   static final int MONITOR_COUNT = Integer.getInteger("monitorsPerThread", 800);

   static final CountDownLatch STARTED = new CountDownLatch(THREAD_COUNT);
   static final CountDownLatch LOCKED  = new CountDownLatch(THREAD_COUNT);
   static final CountDownLatch HANG    = new CountDownLatch(1);

   public static void main(String... args) throws Exception {
     System.out.println("Initializing");

     for (int c = 0; c < THREAD_COUNT; c++) {
       Thread t = new Thread(() -> {
         try {
           STARTED.countDown();
           STARTED.await();
         } catch (InterruptedException e) {}

         for (int l = 0; l < MONITOR_COUNT; l++) {
           try {
             Object o = new Object();
             synchronized (o) {
               o.wait(1);
             }
           } catch (InterruptedException e) {}
         }

         try {
           LOCKED.countDown();
           HANG.await();
         } catch (InterruptedException e) {}
       });
       t.start();
     }

     STARTED.await();
     System.out.println("Started");
     LOCKED.await();
     System.out.println("Locked");
     System.in.read();
     HANG.countDown();
   }

}
```

Run with:

```
$ java -XX:NativeMemoryTracking=summary -Xss256k Monitors.java
Initializing
Started
Locked

<in another terminal>
$ ps x -o pid,rss,command | grep java
67999 704656 .../java -XX:NativeMemoryTracking=summary -Xss256k Monitors.java

$ jcmd 67999 VM.native_memory
...
-           Object Monitors (reserved=325001KB, committed=325001KB)
                            (malloc=325001KB #1600007) 
```

So, out of 704M of RSS, 325M is taken by inflated object monitors, and there is 1.6M of them (2000 threads, 800 monitors each).

I see these ways out of this:

0. Ask users who know about this problem to drop their `MUDT` to much lower value, so that deflation would be triggered more often. This mitigates the issue, but this does not change the default behavior, which means other users are still exposed to this problem.

1. Drop `MUDT` to much lower default value, so that cleanups are more frequent. I think this is safe to do from latency perspective, because the deflation would still be performed asynchronously. The problem with the arbitrarily high ceiling is still present. This might also have throughput regressions since deflater thread would be more active in normal conditions.

2. Drop `AMPTE` to much lower default value, so that monitor population ceiling is not that large. This have the same implications as lowering `MUDT`.

3. Amend VM to request async deflation from safepoint, for example calling a light-weight version of `ObjectSynchronizer::request_deflate_idle_monitors` from safepoint cleanup path. This would be similar to the behavior of the JDK 11 -- piggybacking the cleanup requests on safepoint invocations, but with the benefit of being completely asynchronous.

4. Introduce the additional `GuaranteedAsyncDeflationInterval`, which would normally be larger than `AsyncDeflationInterval`, but which would trigger the deflation even when threshold is not reached. Some large default value, like 60s, should serve long-running systems well without incurring significant work on the deflater thread.

I like (4) quite a bit better, because it acts like a safety rail should the normal heuristics fail.
The threshold heuristics fixes can then proceed at leisurely pace, all the while being covered by this safety net.
Comments
Fix Request (20u, 17u) This fix provides the safety net in the face of heuristics failure to deflate the monitors. This would show up as the slow memory leak with "Object Monitors" NMT growing up. The patch applies cleanly to 20u, applies with minor addition to 17u (see related PRs). Testing, including new regression test, passes. JDK-8305994, JDK-8306774, JDK-8306825 all go in together to fix related issues at once.
02-05-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk17u-dev/pull/1316 Date: 2023-05-02 09:47:38 +0000
02-05-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk20u/pull/61 Date: 2023-04-24 18:01:40 +0000
24-04-2023

New test depends on JDK-8291418 logging messages.
24-04-2023

Changeset: 6b81342c Author: Aleksey Shipilev <shade@openjdk.org> Date: 2023-04-24 17:02:59 +0000 URL: https://git.openjdk.org/jdk/commit/6b81342c2215041dbb7e9020a67cdc56976c97b1
24-04-2023

Yes, thanks for that. I am just scoping out this RFE. Since we have the issue with the current heuristics that went unnoticed through testing and reviews, and it affects real at-scale deployments today, maybe we should think laterally here. First provide the safety net (this RFE) to guarantee deflation in the cases smart heuristics fails, then try to make heuristics even smarter (future RFEs), under the protection of that safety net.
18-04-2023

[~shade] I'm just adding historical context here.
18-04-2023

The issue that we are looking at has to do with MUDT*AMPTE*#threads is not reached until very late for heavily-threaded workloads. We do not enter the NoAsyncDeflationProgressMax handling block either in production use case, or on the model example in this issue, judging by `-Xlog:monitorinflation` logs. So while I think the threshold heuristics needs some further thinking about, it would be a future, more complicated exercise than this one. What this RFE does is providing the safety net in case the normal threshold-based heuristics fails in this or other ways.
14-04-2023

The following issue was used to tweak the way that MonitorUsedDeflationThreshold (MUDT) gets used in JDK17: JDK-8226416 MonitorUsedDeflationThreshold can cause repeated async deflation requests It's helpful to understand where the AsyncDeflationProgressMax and the _no_progress_cnt stuff comes from and why...
14-04-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/13474 Date: 2023-04-14 10:53:31 +0000
14-04-2023