JDK-8360557 : CTW: Expand inlining scope to reach more code
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 26
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2025-06-25
  • Updated: 2025-06-30
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Description
We use CTW testing for making sure compilers behave well. But we compile the code that is not executed at all, and since our inlining heuristics often looks back at profiles, we end up not actually inlining all too much! This means CTW testing likely misses lots of bugs that normal code is exposed to, especially e.g. in loop optimizations.

See for example:

$ test/hotspot/jtreg/testlibrary/ctw/dist $ JAVA_OPTIONS="-XX:+PrintCompilation -XX:+PrintInlining -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:+CITime" ./ctw.sh modules:jdk.compiler

65439 12668    b        com.sun.tools.javac.tree.TreeMaker::Literal (362 bytes)
                            @ 14   com.sun.tools.javac.tree.TreeMaker::Literal (20 bytes)   failed to inline: low call site frequency
                            @ 25   com.sun.tools.javac.code.Type::constType (8 bytes)   failed to inline: virtual call
                            @ 28   com.sun.tools.javac.tree.JCTree$JCLiteral::setType (8 bytes)   failed to inline: low call site frequency
                            @ 47   com.sun.tools.javac.tree.TreeMaker::Literal (20 bytes)   failed to inline: low call site frequency
                            @ 58   com.sun.tools.javac.code.Type$JCPrimitiveType::constType (13 bytes)   failed to inline: low call site frequency
                            @ 61   com.sun.tools.javac.tree.JCTree$JCLiteral::setType (8 bytes)   failed to inline: low call site frequency
                            @ 80   com.sun.tools.javac.tree.TreeMaker::Literal (20 bytes)   failed to inline: low call site frequency
                            @ 91   com.sun.tools.javac.code.Type$JCPrimitiveType::constType (13 bytes)   failed to inline: low call site frequency
                            @ 94   com.sun.tools.javac.tree.JCTree$JCLiteral::setType (8 bytes)   failed to inline: low call site frequency
                            @ 113   com.sun.tools.javac.tree.TreeMaker::Literal (20 bytes)   failed to inline: low call site frequency
                            @ 124   com.sun.tools.javac.code.Type$JCPrimitiveType::constType (13 bytes)   failed to inline: low call site frequency
                            @ 127   com.sun.tools.javac.tree.JCTree$JCLiteral::setType (8 bytes)   failed to inline: low call site frequency
                            @ 149   java.lang.Character::toString (8 bytes)   failed to inline: never executed
                            @ 153   java.lang.String::charAt (25 bytes)   failed to inline: already compiled into a medium method
                            @ 164   java.lang.Integer::valueOf (32 bytes)   failed to inline: call site not reached
                            @ 167   com.sun.tools.javac.tree.TreeMaker::Literal (20 bytes)   failed to inline: low call site frequency
                            @ 179   java.lang.Integer::valueOf (32 bytes)   failed to inline: call site not reached
                            @ 182   com.sun.tools.javac.code.Type$JCPrimitiveType::constType (13 bytes)   failed to inline: low call site frequency
                            @ 185   com.sun.tools.javac.tree.JCTree$JCLiteral::setType (8 bytes)   failed to inline: low call site frequency
                            @ 204   com.sun.tools.javac.tree.TreeMaker::Literal (20 bytes)   failed to inline: low call site frequency
                            @ 215   com.sun.tools.javac.code.Type$JCPrimitiveType::constType (13 bytes)   failed to inline: low call site frequency
                            @ 218   com.sun.tools.javac.tree.JCTree$JCLiteral::setType (8 bytes)   failed to inline: low call site frequency
                            @ 237   com.sun.tools.javac.tree.TreeMaker::Literal (20 bytes)   failed to inline: low call site frequency
                            @ 248   com.sun.tools.javac.code.Type$JCPrimitiveType::constType (13 bytes)   failed to inline: low call site frequency
                            @ 251   com.sun.tools.javac.tree.JCTree$JCLiteral::setType (8 bytes)   failed to inline: low call site frequency
                            @ 270   com.sun.tools.javac.tree.TreeMaker::Literal (20 bytes)   failed to inline: low call site frequency
                            @ 281   com.sun.tools.javac.code.Type$JCPrimitiveType::constType (13 bytes)   failed to inline: low call site frequency
                            @ 284   com.sun.tools.javac.tree.JCTree$JCLiteral::setType (8 bytes)   failed to inline: low call site frequency
                            @ 304   java.lang.Boolean::booleanValue (5 bytes)   accessor
                            @ 323   java.lang.Integer::valueOf (32 bytes)   failed to inline: call site not reached
                            @ 326   com.sun.tools.javac.tree.TreeMaker::Literal (20 bytes)   failed to inline: low call site frequency
                            @ 338   java.lang.Integer::valueOf (32 bytes)   failed to inline: call site not reached
                            @ 341   com.sun.tools.javac.code.Type$JCPrimitiveType::constType (13 bytes)   failed to inline: low call site frequency
                            @ 344   com.sun.tools.javac.tree.JCTree$JCLiteral::setType (8 bytes)   failed to inline: low call site frequency


AFAICS, we only inline trivial methods, accessors and some code that happens to be hot since JDK itself touches it. We should configure CTW to inline quite a bit more.

There is an intrinsic tradeoff in this. If we improve inlining, we will likely reach more interesting cases, but it would also take more time to compile. Remains to be seen where is the sweet spot in this.
Comments
> I'll go and see how much hassle it leads to for `make test TEST=applications/ctw/modules`. Yeah, it's bad: # Default real 5m1.616s user 79m41.398s sys 14m39.607s # Inline cold real 5m52.914s user 101m36.090s sys 14m35.762s # Inline cold + bump inlining (+50) real 7m42.231s user 128m45.707s sys 14m6.747s # Inline cold + bump inlining (+70) real 23m36.571s user 369m41.136s sys 20m34.323s Looking if we can improve CTW a bit (JDK-8360867, JDK-8360783, etc), and/or cut down on the inlining bumps while still capturing interesting cases.
27-06-2025

Good pointers, thanks. For the same `modules:jdk.compiler` test, this is what I see: ``` # (a) Default Tier1 {speed: 28742.073 bytes/s; standard: 45.657 s, 1312275 bytes, 13591 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 28037736 bytes; nmethods_code_size: 15006864 bytes} Tier2 {speed: 31976.408 bytes/s; standard: 41.036 s, 1312199 bytes, 13457 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 29479608 bytes; nmethods_code_size: 15992448 bytes} Tier3 {speed: 26549.479 bytes/s; standard: 51.108 s, 1356880 bytes, 14284 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 49795472 bytes; nmethods_code_size: 33058592 bytes} Tier4 {speed: 9555.018 bytes/s; standard: 95.701 s, 914238 bytes, 13615 methods; osr: 0.031 s, 488 bytes, 3 methods; nmethods_size: 16377728 bytes; nmethods_code_size: 9072432 bytes} # (b) -Xcomp Tier1 {speed: 26882.625 bytes/s; standard: 48.790 s, 1311597 bytes, 13507 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 28013992 bytes; nmethods_code_size: 14995440 bytes} Tier2 {speed: 31243.056 bytes/s; standard: 42.003 s, 1312291 bytes, 13458 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 29486760 bytes; nmethods_code_size: 15997336 bytes} Tier3 {speed: 34234.203 bytes/s; standard: 58.406 s, 1999487 bytes, 19344 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 71372568 bytes; nmethods_code_size: 47439376 bytes} Tier4 {speed: 14626.877 bytes/s; standard: 145.556 s, 2129034 bytes, 19012 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 30821944 bytes; nmethods_code_size: 17286080 bytes} # (c) -XX:-UseInterpreter Tier1 {speed: 21787.450 bytes/s; standard: 60.210 s, 1311813 bytes, 13466 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 28009592 bytes; nmethods_code_size: 14992280 bytes} Tier2 {speed: 23756.621 bytes/s; standard: 55.294 s, 1313606 bytes, 13492 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 29517920 bytes; nmethods_code_size: 16013136 bytes} Tier3 {speed: 26403.299 bytes/s; standard: 71.377 s, 1884597 bytes, 18211 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 67875296 bytes; nmethods_code_size: 45271224 bytes} Tier4 {speed: 9382.242 bytes/s; standard: 156.945 s, 1472542 bytes, 13602 methods; osr: 0.044 s, 360 bytes, 4 methods; nmethods_size: 22375944 bytes; nmethods_code_size: 13154872 bytes} # (d) Don't skip cold methods for inlining Tier1 {speed: 32861.676 bytes/s; standard: 39.936 s, 1312372 bytes, 13588 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 28042888 bytes; nmethods_code_size: 15010480 bytes} Tier2 {speed: 34954.645 bytes/s; standard: 37.544 s, 1312325 bytes, 13457 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 29482056 bytes; nmethods_code_size: 15993968 bytes} Tier3 {speed: 29564.277 bytes/s; standard: 45.966 s, 1358965 bytes, 14279 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 49865384 bytes; nmethods_code_size: 33104504 bytes} Tier4 {speed: 13459.034 bytes/s; standard: 122.097 s, 1643370 bytes, 13616 methods; osr: 0.049 s, 601 bytes, 3 methods; nmethods_size: 21984680 bytes; nmethods_code_size: 12701040 bytes} # (e) Don't skip cold methods for inlining + bump non-profiled inline limits (35 -> 50) Tier1 {speed: 38396.287 bytes/s; standard: 40.114 s, 1540234 bytes, 13592 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 31207352 bytes; nmethods_code_size: 16764240 bytes} Tier2 {speed: 44371.624 bytes/s; standard: 34.690 s, 1539257 bytes, 13458 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 32692296 bytes; nmethods_code_size: 17779952 bytes} Tier3 {speed: 29658.624 bytes/s; standard: 54.963 s, 1630122 bytes, 14274 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 57421424 bytes; nmethods_code_size: 38247944 bytes} Tier4 {speed: 14295.108 bytes/s; standard: 131.448 s, 1879466 bytes, 13605 methods; osr: 0.079 s, 735 bytes, 9 methods; nmethods_size: 23974016 bytes; nmethods_code_size: 13718000 bytes} # (f) Don't skip cold methods for inlining + bump non-profiled inline limits (35 -> 70) Tier1 {speed: 41409.144 bytes/s; standard: 42.602 s, 1764106 bytes, 13590 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 34422120 bytes; nmethods_code_size: 18412512 bytes} Tier2 {speed: 42261.458 bytes/s; standard: 41.724 s, 1763334 bytes, 13458 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 35994760 bytes; nmethods_code_size: 19487408 bytes} Tier3 {speed: 30461.129 bytes/s; standard: 61.376 s, 1869591 bytes, 14278 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 64448952 bytes; nmethods_code_size: 42968960 bytes} Tier4 {speed: 14590.537 bytes/s; standard: 172.759 s, 2521357 bytes, 13602 methods; osr: 0.110 s, 903 bytes, 11 methods; nmethods_size: 30766360 bytes; nmethods_code_size: 16915904 bytes} # (g) Don't skip cold methods for inlining + bump non-profiled inline limits (35 -> 140) Tier1 {speed: 39335.400 bytes/s; standard: 54.193 s, 2131696 bytes, 13590 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 40012136 bytes; nmethods_code_size: 20997592 bytes} Tier2 {speed: 45879.790 bytes/s; standard: 46.505 s, 2133645 bytes, 13458 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 41672944 bytes; nmethods_code_size: 22124760 bytes} Tier3 {speed: 31808.565 bytes/s; standard: 71.354 s, 2269661 bytes, 14275 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 76235552 bytes; nmethods_code_size: 50630968 bytes} Tier4 {speed: 14095.659 bytes/s; standard: 374.018 s, 5272648 bytes, 13591 methods; osr: 0.108 s, 903 bytes, 11 methods; nmethods_size: 62251544 bytes; nmethods_code_size: 33958600 bytes} # (h) Don't skip cold methods for inlining + bump non-profiled inline limits (35 -> 280) Tier1 {speed: 46344.424 bytes/s; standard: 52.740 s, 2444198 bytes, 13600 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 44642312 bytes; nmethods_code_size: 23213880 bytes} Tier2 {speed: 51696.057 bytes/s; standard: 47.240 s, 2442125 bytes, 13458 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 46267760 bytes; nmethods_code_size: 24334480 bytes} Tier3 {speed: 31810.925 bytes/s; standard: 82.367 s, 2620163 bytes, 14358 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 86197280 bytes; nmethods_code_size: 57138872 bytes} Tier4 {speed: 14307.788 bytes/s; standard: 453.966 s, 6495614 bytes, 13598 methods; osr: 0.115 s, 1275 bytes, 12 methods; nmethods_size: 71696704 bytes; nmethods_code_size: 38715528 bytes} # (i) Don't skip cold methods for inlining + bump non-profiled inline limits (35 -> 350) -- a bit above FreqInlineSize Tier1 {speed: 49077.177 bytes/s; standard: 50.359 s, 2471456 bytes, 13606 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 45119016 bytes; nmethods_code_size: 23454264 bytes} Tier2 {speed: 47521.927 bytes/s; standard: 52.019 s, 2472048 bytes, 13460 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 46782952 bytes; nmethods_code_size: 24592368 bytes} Tier3 {speed: 31203.185 bytes/s; standard: 85.018 s, 2652823 bytes, 14379 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 87239832 bytes; nmethods_code_size: 57832336 bytes} Tier4 {speed: 14368.248 bytes/s; standard: 500.018 s, 7184717 bytes, 13611 methods; osr: 0.103 s, 1154 bytes, 11 methods; nmethods_size: 78681920 bytes; nmethods_code_size: 42377456 bytes} ``` I interpret these results as follows: 1. There are about 13K methods to compile in jdk.compiler. This number is more or less consistent across all tiers, which is what I expect from CTW. 2. Both -Xcomp (b) and -XX:-UseInterpreter (c) compile +6K methods on tiers 3 and 4, I assume those are JDK/infra code. 3. Not skipping cold methods (d) (UseNewCode{2} in my experimental patch above) gives us +40% more nmethod code. It is the proxy for how far compilers go, I think. 4. Not skipping cold methods *and* bumping the non-profiled inline limits a bit (f) gives us +90% more code. This number is on par with -Xcomp, but without compiling extra methods, so we spend this time/space on the code we actually want to test. 5. As expected, the compilation times and code reach grow steadily as we bump the inlining limits (e...i). As expected, this is especially egregious in tier4. I think making CTW 5x slower is prohibitive for testing times, even if you have a lot of compute. The large CTW corpus already takes about a day on a large machine. So I suggest we do something like (e) as the basic case for CTW. *Then* we implement, separately, a randomized stress inlining that would inline things randomly above that baseline. In the end, this would give us a good basic (and thus very repeatable!) coverage, a good randomized coverage, without costing us way too much compute. I know from overnight tests that (f) already shows lots of interesting failures. I'll go and see how much hassle it leads to for `make test TEST=applications/ctw/modules`.
26-06-2025

That's nice, thanks for experimenting with this Aleksey! My long term plan of randomizing profile information (at least to some extend) would be the next step here and it would also randomize the inlining decisions, see JDK-8355466.
26-06-2025

For now, maybe randomizing the inlining instead of forcing would give us even more coverage.
26-06-2025

Datapoint, the very first experiment: diff --git a/src/hotspot/share/opto/bytecodeInfo.cpp b/src/hotspot/share/opto/bytecodeInfo.cpp index 547cf2f6a38..ed5f7899873 100644 --- a/src/hotspot/share/opto/bytecodeInfo.cpp +++ b/src/hotspot/share/opto/bytecodeInfo.cpp @@ -263,6 +263,11 @@ bool InlineTree::should_not_inline(ciMethod* callee_method, ciMethod* caller_met return false; } + if (UseNewCode) { + set_msg("force inline by CTW"); + return false; + } + // Now perform checks which are heuristic if (is_unboxing_method(callee_method, C)) { @@ -408,6 +413,8 @@ bool InlineTree::try_to_inline(ciMethod* callee_method, ciMethod* caller_method, // inline constructors even if they are not reached. } else if (forced_inline()) { // Inlining was forced by CompilerOracle, ciReplay or annotation + } else if (UseNewCode2) { + // CTW testing has no reachable code. } else if (is_not_reached(callee_method, caller_method, caller_bci, profile)) { // don't inline unreached call sites set_msg("call site not reached"); diff --git a/test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/CtwRunner.java b/test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/CtwRunner.java index 573b70faabe..2d606ccdc97 100644 --- a/test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/CtwRunner.java +++ b/test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/CtwRunner.java @@ -305,6 +305,11 @@ private String[] cmd(long classStart, long classStop) { // Expand the optimization scope by disallowing most traps. "-XX:PerMethodTrapLimit=0", "-XX:PerMethodSpecTrapLimit=0", + // Expand the scope of inlining + "-XX:MaxInlineSize=70", + "-XX:C1MaxInlineSize=70", + "-XX:+UseNewCode", + "-XX:+UseNewCode2", // Do not pay extra stack trace generation cost for normally thrown exceptions "-XX:-StackTraceInThrowable", "-XX:+IgnoreUnrecognizedVMOptions", ...already found 3 different crashes while completing only ~5% of the CTW corpus I have here. So this approach looks very viable. I'll submit the bugs for those crashes separately. # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/opto/memnode.cpp:4235), pid=1690554, tid=1696677 # assert(Opcode() == Op_Initialize) failed: Only seen when there are no use of init memory V [libjvm.so+0x163f807] MemBarNode::remove(PhaseIterGVN*)+0x237 (memnode.cpp:4235) V [libjvm.so+0x163f9c3] MemBarNode::Ideal(PhaseGVN*, bool)+0x1a3 (memnode.cpp:4310) V [libjvm.so+0x1822a51] PhaseIterGVN::transform_old(Node*)+0xc1 (phaseX.cpp:668) V [libjvm.so+0x1826bd6] PhaseIterGVN::optimize()+0xb6 (phaseX.cpp:1054) ... # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/opto/node.cpp:1005), pid=1844212, tid=1853094 # assert(depth_count++ < K) failed: infinite loop in Node::uncast_helper V [libjvm.so+0x171e62c] Node::uncast_helper(Node const*, bool)+0x23c (node.cpp:1005) V [libjvm.so+0x1c2f186] SubNode::Value_common(PhaseValues*) const+0x76 (node.hpp:507) V [libjvm.so+0x1c2f346] SubNode::Value(PhaseGVN*) const+0x16 (subnode.cpp:105) V [libjvm.so+0x1822667] PhaseGVN::transform(Node*)+0x1d7 (phaseX.cpp:703) # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/opto/phaseX.cpp:784), pid=1799456, tid=1799480 # assert(no_dead_loop) failed: dead loop detected V [libjvm.so+0x18142e5] PhaseGVN::dead_loop_check(Node*) [clone .part.0]+0x1a5 (phaseX.cpp:784) V [libjvm.so+0x1822ead] PhaseIterGVN::transform_old(Node*)+0x51d (phaseX.cpp:767) V [libjvm.so+0x1826bd6] PhaseIterGVN::optimize()+0xb6 (phaseX.cpp:1054) V [libjvm.so+0xb66c9e] Compile::inline_incrementally_cleanup(PhaseIterGVN&)+0x2be (compile.cpp:2161)
25-06-2025