JDK-8244278 : Excessive code cache flushes and sweeps
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 9,10,11,12,13,14,15
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2020-05-02
  • Updated: 2020-10-20
  • Resolved: 2020-06-03
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 15
11.0.10-oracleFixed 15 b26Fixed
Related Reports
Relates :  
Relates :  
Description
We have noticed increased deoptimizations and recompilations due to more frequent code cache flushing, during migration from JDK8 to JDK11. As a result, applications experienced latency and CPU usage regressions.

The aggressive flushing behavior is likely introduced in JDK-8046809. The new option -XX:StartAggressiveSweepingAt is not fully respected, because NMethodSweeper::notify() should use MAX2(start_threshold, 1.1) instead of MIN2.

This behavior can be reproduced with DaCapo's tradesoap benchmark, which could use ~20MB of code cache:
$ java -Xms2g -Xmx2g -XX:ReservedCodeCacheSize=40m -XX:StartAggressiveSweepingAt=<N> -Xlog:codecache+sweep*=debug -XX:+UnlockDiagnosticVMOptions -XX:+PrintMethodFlushingStatistics -jar dacapo-9.12-MR1-bach.jar tradesoap -n 5 -t 16

The log shows that code cache is flushed when the usage is far below (100 - StartAggressiveSweepingAt)%, which is 90% by default.

Counting lines with "CodeCache flushing" is the number of completed flushes (NMethodSweeper::sweep_code_cache()).
"Total number of full sweeps" at the end of the output is the number of attempted sweeps (NMethodSweeper::possibly_sweep()), i.e. number of times the sweeper thread is woken up.

Without the fix:
StartAggressiveSweepingAt  |  completed_flushes  |  attempted_sweeps
2       |       418   |    12768
10 (default) |  405   |   12907
50     |           419   |    11933

With the fix:
StartAggressiveSweepingAt  |  completed_flushes  |  attempted_sweeps
2      |            3      |    3
10 (default)  | 6    |      6
50    |           315   |   315

Comments
Fix request (11u) -- will label after testing completed. I would like to downport this for parity with 11.0.10-oracle. I had to do a really trivial resolve of this one-line change: http://cr.openjdk.java.net/~goetz/wr20/8244278-code_cache_flushes-jdk11/ Nils Eliasson contributed JDK-8254185 to 11u. http://mail.openjdk.java.net/pipermail/jdk-updates-dev/2020-October/003937.html Now this is safe to push.
16-10-2020

Thanks for the hint! I'll postpone my work on this to see whether 8244660 is pushed to 11.0.10-oracle.
22-09-2020

Thanks! Note that this change depends on JDK-8244660, which might cause other conflicts in 11u. I saw JDK-8247775 is marked as "Won't Fix", so I assume it didn't make into 11.0.10-oracle? I'm asking on JDK-8247775.
21-09-2020

URL: https://hg.openjdk.java.net/jdk/jdk/rev/5cf00ea9e479 User: neliasso Date: 2020-06-03 13:34:15 +0000
03-06-2020

We have tested this change by itself in our internal JDK 11 on real production workload. 6 out of 7 important servers show significant CPU usage reduction, typically in the range of 5%-10%. 2 servers show up to 15%-20% reduction. All servers are running with -XX:-TieredCompilation. There is a noticeable increase in code cache usage, but it is only tens of MBs and is less of a concern for us compared to the reduction in CPU usage. As @neliasso suggested, the best path forward is to push this change together with JDK-8244660, so it does not introduce the regression in code cache usage, and probably still retains the improvement in CPU and throughput.
22-05-2020

Thanks, we will see if we can add it into our testing with workarounds like you suggest.
14-05-2020

Hi Eric, tradesoap in DaCapo 9.12 is also failing for us with a rate about 10%-25%. We just eat the exceptions and exclude the failing runs in the report. We run the benchmarks for 34 trials, and usually we have 20+ usable trials. One trick is that even if a warmup iteration fails, the final iteration could still pass and its result is still usable. We use a custom DaCapo callback class that saves the result if the "valid" parameter for complete() [1] is true. A long term plan is to switch to the latest DaCapo, which replaces the Geronimo with Wildfly for tradesoap/tradebeans, and hopefully fixes the problem. We haven't tried it yet. [1] https://github.com/dacapobench/dacapobench/blob/3867b2677b8bc847d13c6b3676465793d518b107/benchmarks/harness/src/MyCallback.java#L38
14-05-2020

Hi [~manc] -- we have not been running DaCapo's tradesoap benchmark regularly because it throws exceptions in maybe 1 in 10 runs and not all the same exception either, so the validation fails. Is there a trick to it? Do you just eat the exceptions and run it anyway? I have tried 11,12,13,14, and some 15s and they all got some exceptions but not in every run. TIA for any tips.
13-05-2020

I have attach two sets of benchmarking results based on JDK11, each with normalized and non-normalized version. DaCapo's tradesoap shows 1.7% improvement in throughput and 5.3% reduction in CPU usage, and code cache usage increased from 12.7MB to 26.2MB. In our experience, tradesoap is most important benchmark in DaCapo because it behaves most closely to our large production applications. Blaze's workload1 and workload2 show ~3% and ~7% improvement in throughput, respectively. Code cache usage increased by 30% to 70%. Notes on benchmarking methodology and environment. - Runs are on Google-internal JDK11 with and without the fix, using G1 collector. - DaCapo runs are with "-XX:-TieredCompilation -XX:ReservedCodeCacheSize=40m -Xms4g -Xmx4g", in order to eliminate impact from GC and put some pressure on code cache. - Metrics for DaCapo only measure the state after the code is fully warmed-up, i.e., excluding warmup iterations. - Blaze is Google's build system (internal version of Bazel). The benchmarks run Blaze with different input workload. - Blaze runs with -XX:+TieredCompilation. - Blaze workload1 and workload3 have 10GB Xmx, workload2 has 18GB Xmx. We have tuned G1 for throughput for Blaze. - Blaze runs have 480MB ReservedCodeCacheSize determined by ergonomics. - Metrics for Blaze benchmarks measure entire run, including time to warm up code.
04-05-2020

ILW = Performance regression due to excessive deoptimizations of compiled code, with small code cache, no workaround but disable code cache flushing = MMH = P3
04-05-2020