JDK-8238247 : CTW runner should sweep nmethods more aggressively
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 11,14,15
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2020-01-30
  • Updated: 2024-02-12
  • Resolved: 2020-02-10
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 13 JDK 14 JDK 15
11.0.7Fixed 13.0.4Fixed 14.0.2Fixed 15 b10Fixed
Related Reports
Relates :  
Relates :  
Description
This is very visible when compiling the JAR with many files. For example, compiling this one:
$ wget https://repo1.maven.org/maven2/org/apache/servicemix/bundles/org.apache.servicemix.bundles.quickfix/1.6.3_1/org.apache.servicemix.bundles.quickfix-1.6.3_1.jar
$ ./ctw.sh org.apache.servicemix.bundles.quickfix-1.6.3_1.jar

It contains about 0.5M methods, which ends up doing 2M compilations on all tiers, and CTW run on it takes multiple days -- I basically terminated the run after 40 hours, with about half done.

Profiling with this patch: http://cr.openjdk.java.net/~shade/8238247/8238247-tracing.patch -- shows that compilation and deoptimization time grows nearly-linearly with the number of nmethods in the system, and CTW stabilizes at about 110K nmethods present. This is, alas, too much for the stable point when aggressive compilation and deoptimization is needed.

Making the sweeper more aggressive with e.g. -XX:NmethodSweepActivity=2000 keeps nmethod population very low. The CTW now finishes in about 2 hours, which is at least 20x improvement. See the example chart from my experiments below. Default run was terminated early, because it was evident it goes nowhere. 
 https://cr.openjdk.java.net/~shade/8238247/plot.png

So, my suggestion would be to put -XX:NmethodSweepActivity in here:

diff -r c3c9f45a18cf test/hotspot/jtreg/testlibrary/ctw/Makefile
--- a/test/hotspot/jtreg/testlibrary/ctw/Makefile       Thu Jan 30 13:31:24 2020 +0100
+++ b/test/hotspot/jtreg/testlibrary/ctw/Makefile       Thu Jan 30 20:41:36 2020 +0100
@@ -67,11 +67,11 @@
 
 $(DST_DIR):
        @mkdir -p $@
 
 $(DST_DIR)/ctw.sh: $(DST_DIR)
-       echo '$${JAVA_HOME}/bin/java $${JAVA_OPTIONS} $(EXPORTS) -XX:-UseCounterDecay -Xbatch "-XX:CompileCommand=exclude,java/lang/invoke/MethodHandle.*" -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -Xbootclasspath/a:wb.jar -jar ctw.jar $$@' > $@
+       echo '$${JAVA_HOME}/bin/java $${JAVA_OPTIONS} $(EXPORTS) -XX:-UseCounterDecay -Xbatch -XX:NmethodSweepActivity=2000 "-XX:CompileCommand=exclude,java/lang/invoke/MethodHandle.*" -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -Xbootclasspath/a:wb.jar -jar ctw.jar $$@' > $@
        chmod a+x $@
 
 $(DST_DIR)/ctw.jar: filelist $(DST_DIR)/wb.jar
        @mkdir -p $(OUTPUT_DIR)
        $(JAVAC) $(EXPORTS) -sourcepath $(SRC_DIR) -d $(OUTPUT_DIR) -cp $(DST_DIR)/wb.jar @filelist

Comments
Fix request (13u): The original change applies cleanly, passes tier1,tier2,tier3, and CWT tests.
05-06-2020

Fix Request (14u) Same reason, applicability, testing as for 11u.
06-03-2020

Fix Request (11u) This substantially improves the CTW tests performance. Patch applies cleanly to 11u, passes full CTW tests with Linux x86_64 fastdebug, improves application/ctw/modules about 2.4x times, from 22min -> 9min.
12-02-2020

URL: https://hg.openjdk.java.net/jdk/jdk/rev/787d9d725483 User: shade Date: 2020-02-10 05:26:20 +0000
10-02-2020

Actually, we might want to take the non-tiered path as well for the same reason: https://cr.openjdk.java.net/~shade/8238247/webrev.01/ This thing improves the CTW against the class library: $ time CONF=linux-x86_64-server-fastdebug make run-test TEST=applications/ctw/modules/ Before (confirmed by repeating 3 times): real 16m23.877s user 142m56.362s sys 2m19.289s After (confirmed by repeating 3 times): real 9m48.134s user 94m0.456s sys 2m0.267s It keeps the time with TEST_VM_OPTS="-XX:-TieredCompilation" roughly the same: Before: real 12m47.253s user 39m33.871s sys 0m33.381s After: real 12m56.067s user 41m2.641s sys 0m36.282s
06-02-2020

Let me take that to RFR: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-February/036896.html
06-02-2020

> how about the following patch (w/ DEOPTIMIZE_ALL_CLASSES_RATE and DeoptimizeAllClassesRate being renamed to _METHOD_RATE and MethodRate)? That should be good in itself, but I am not sure it is as straightforward as the 8238247-deopt-last-tier.patch. Also, in this form we might miss some deopt-alls, because we might jump over exact (methodId % Utils.DEOPTIMIZE_ALL_CLASSES_RATE == 0) with methodCount > 1 ;) Fork it as separate enhancement? Meanwhile, I managed to go through the entire bulk of JAR files I have with 8238247-deopt-last-tier.patch, and it improves the CTW times dramatically, at least 2x. Before: (terminated early for the lack of resources) real 2061m43,506s user 19274m27,986s sys 474m31,910s After: real 920m4,456s user 10225m57,087s sys 436m22,297s
31-01-2020

how about the following patch (w/ DEOPTIMIZE_ALL_CLASSES_RATE and DeoptimizeAllClassesRate being renamed to _METHOD_RATE and MethodRate)? diff -r 5619c09e3b62 test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/Compiler.java --- a/test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/Compiler.java Thu Jan 30 08:55:20 2020 -0800 +++ b/test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/Compiler.java Thu Jan 30 14:36:45 2020 -0800 @@ -82,10 +82,10 @@ ++methodCount; executor.execute(new CompileMethodCommand(id, e)); } - METHOD_COUNT.addAndGet(methodCount); + long methodId = METHOD_COUNT.addAndGet(methodCount); if (Utils.DEOPTIMIZE_ALL_CLASSES_RATE > 0 - && (id % Utils.DEOPTIMIZE_ALL_CLASSES_RATE == 0)) { + && (methodId % Utils.DEOPTIMIZE_ALL_CLASSES_RATE == 0)) { WHITE_BOX.deoptimizeAll(); } }
30-01-2020

> What if you run the test with "-XX:+OptoNoExecute -XX:-InstallMethods"? I don't think WB likes it too much. It starts to fail with "failed to blocking compile at level $X" for most (all?) methods. I also wonder if code cache installation also runs additional code we want to test, e.g. relocations?
30-01-2020

So, for the JAR in question: baseline: did not finish after 40+ hours, terminated baseline + -XX:NSA=2000: Done (9484 classes, 546348 methods, 5.684.928 ms) -- 1h35m baseline + 8238247-deopt-last-tier.patch: Done (9484 classes, 546348 methods, 5.442.512 ms) -- 1h31m Oh my, that looks good. I shall be running patched CTW overnight. Updated the plot as well: https://cr.openjdk.java.net/~shade/8238247/plot.png
30-01-2020

W.r.t. CTW, does it make much sense to install nmethods into code cache at all? What if you run the test with "-XX:+OptoNoExecute -XX:-InstallMethods"?
30-01-2020

> I believe at some point we called WB:deoptimizeMethod right after compilation AFAICS, current Compiler.CompileMethodCommand.run() does that when tiered compilation is enabled: it does deopt before each tier, after the previous tier compilation. Oh wait. But it leaves the compiled version from the last tier. This patch does look like improving the nmethod population without any other changes: http://cr.openjdk.java.net/~shade/8238247/8238247-deopt-last-tier.patch
30-01-2020

Before going this nmethod sweeper route, I did DeoptimizeAllClassesRate experiments too, it does not help as reliably. The problem with that option is choice of the setting: measured in classes. So, for the adversarial JARs like in the example above, we have 9K classes and 500K methods. It calls for DeoptimizeAllClassesRate of about 100 (i.e. 100 times over 2 "good" hours, ~5.5K methods between deopt-all-s?) ? At the same time, the vast majority of JARs have much shallower classes, where deoptimizing everything every 100 classes prolongs the CTW run. At this point, I believe CTW should just give up the nmethods for sweeping right away, and nmethods sweeper needs to step in faster.
30-01-2020

I believe at some point we called WB:deoptimizeMethod right after compilation, but that had a negative performance impact on existing tests (java.base ? ). I'll try to dig up if it was really a case (or it's a false memory) and if it was, then when and why we changed that.
30-01-2020

could you please run a couple of expereiments w/ different value of DeoptimizeAllClassesRate to see if it helps?
30-01-2020

[~shade], there is -DDeoptimizeAllClassesRate option which controls how often CTW deoptimizes all nmethods (by WhiteBox::deoptmizeAll), by default it's -1, which means CTW does not do that at all.
30-01-2020

Igor, please take a look. Maybe there is another way to achieve the same that I am missing.
30-01-2020