Bug ID: JDK-8244278 Excessive code cache flushes and sweeps

Type: Bug
Component: hotspot
Sub-Component: compiler
Affected Version: 9,10,11,12,13,14,15

Priority: P3
Status: Resolved
Resolution: Fixed

Submitted: 2020-05-02
Updated: 2024-10-17
Resolved: 2020-06-03

JDK 11	JDK 15
11.0.10-oracleFixed	15 b26Fixed

We have noticed increased deoptimizations and recompilations due to more frequent code cache flushing, during migration from JDK8 to JDK11. As a result, applications experienced latency and CPU usage regressions.

The aggressive flushing behavior is likely introduced in JDK-8046809. The new option -XX:StartAggressiveSweepingAt is not fully respected, because NMethodSweeper::notify() should use MAX2(start_threshold, 1.1) instead of MIN2.

This behavior can be reproduced with DaCapo's tradesoap benchmark, which could use ~20MB of code cache:
$ java -Xms2g -Xmx2g -XX:ReservedCodeCacheSize=40m -XX:StartAggressiveSweepingAt=<N> -Xlog:codecache+sweep*=debug -XX:+UnlockDiagnosticVMOptions -XX:+PrintMethodFlushingStatistics -jar dacapo-9.12-MR1-bach.jar tradesoap -n 5 -t 16

The log shows that code cache is flushed when the usage is far below (100 - StartAggressiveSweepingAt)%, which is 90% by default.

Counting lines with "CodeCache flushing" is the number of completed flushes (NMethodSweeper::sweep_code_cache()).
"Total number of full sweeps" at the end of the output is the number of attempted sweeps (NMethodSweeper::possibly_sweep()), i.e. number of times the sweeper thread is woken up.

Without the fix:
StartAggressiveSweepingAt  |  completed_flushes  |  attempted_sweeps
2       |       418   |    12768
10 (default) |  405   |   12907
50     |           419   |    11933

With the fix:
StartAggressiveSweepingAt  |  completed_flushes  |  attempted_sweeps
2      |            3      |    3
10 (default)  | 6    |      6
50    |           315   |   315

Fix request (11u) -- will label after testing completed. I would like to downport this for parity with 11.0.10-oracle. I had to do a really trivial resolve of this one-line change: http://cr.openjdk.java.net/~goetz/wr20/8244278-code_cache_flushes-jdk11/ Nils Eliasson contributed JDK-8254185 to 11u. http://mail.openjdk.java.net/pipermail/jdk-updates-dev/2020-October/003937.html Now this is safe to push.
16-10-2020
Thanks for the hint! I'll postpone my work on this to see whether 8244660 is pushed to 11.0.10-oracle.
22-09-2020
Thanks! Note that this change depends on JDK-8244660, which might cause other conflicts in 11u. I saw JDK-8247775 is marked as "Won't Fix", so I assume it didn't make into 11.0.10-oracle? I'm asking on JDK-8247775.
21-09-2020
URL: https://hg.openjdk.java.net/jdk/jdk/rev/5cf00ea9e479 User: neliasso Date: 2020-06-03 13:34:15 +0000
03-06-2020
We have tested this change by itself in our internal JDK 11 on real production workload. 6 out of 7 important servers show significant CPU usage reduction, typically in the range of 5%-10%. 2 servers show up to 15%-20% reduction. All servers are running with -XX:-TieredCompilation. There is a noticeable increase in code cache usage, but it is only tens of MBs and is less of a concern for us compared to the reduction in CPU usage. As @neliasso suggested, the best path forward is to push this change together with JDK-8244660, so it does not introduce the regression in code cache usage, and probably still retains the improvement in CPU and throughput.
22-05-2020
Thanks, we will see if we can add it into our testing with workarounds like you suggest.
14-05-2020
Hi Eric, tradesoap in DaCapo 9.12 is also failing for us with a rate about 10%-25%. We just eat the exceptions and exclude the failing runs in the report. We run the benchmarks for 34 trials, and usually we have 20+ usable trials. One trick is that even if a warmup iteration fails, the final iteration could still pass and its result is still usable. We use a custom DaCapo callback class that saves the result if the "valid" parameter for complete() [1] is true. A long term plan is to switch to the latest DaCapo, which replaces the Geronimo with Wildfly for tradesoap/tradebeans, and hopefully fixes the problem. We haven't tried it yet. [1] https://github.com/dacapobench/dacapobench/blob/3867b2677b8bc847d13c6b3676465793d518b107/benchmarks/harness/src/MyCallback.java#L38
14-05-2020
Hi [~manc] -- we have not been running DaCapo's tradesoap benchmark regularly because it throws exceptions in maybe 1 in 10 runs and not all the same exception either, so the validation fails. Is there a trick to it? Do you just eat the exceptions and run it anyway? I have tried 11,12,13,14, and some 15s and they all got some exceptions but not in every run. TIA for any tips.
13-05-2020
I have attach two sets of benchmarking results based on JDK11, each with normalized and non-normalized version. DaCapo's tradesoap shows 1.7% improvement in throughput and 5.3% reduction in CPU usage, and code cache usage increased from 12.7MB to 26.2MB. In our experience, tradesoap is most important benchmark in DaCapo because it behaves most closely to our large production applications. Blaze's workload1 and workload2 show ~3% and ~7% improvement in throughput, respectively. Code cache usage increased by 30% to 70%. Notes on benchmarking methodology and environment. - Runs are on Google-internal JDK11 with and without the fix, using G1 collector. - DaCapo runs are with "-XX:-TieredCompilation -XX:ReservedCodeCacheSize=40m -Xms4g -Xmx4g", in order to eliminate impact from GC and put some pressure on code cache. - Metrics for DaCapo only measure the state after the code is fully warmed-up, i.e., excluding warmup iterations. - Blaze is Google's build system (internal version of Bazel). The benchmarks run Blaze with different input workload. - Blaze runs with -XX:+TieredCompilation. - Blaze workload1 and workload3 have 10GB Xmx, workload2 has 18GB Xmx. We have tuned G1 for throughput for Blaze. - Blaze runs have 480MB ReservedCodeCacheSize determined by ergonomics. - Metrics for Blaze benchmarks measure entire run, including time to warm up code.
04-05-2020
ILW = Performance regression due to excessive deoptimizations of compiled code, with small code cache, no workaround but disable code cache flushing = MMH = P3
04-05-2020

Relates :	JDK-8046809 - vm/mlvm/meth/stress/compiler/deoptimize CodeCache is full.
Relates :	JDK-8244660 - Code cache sweeper heuristics is broken