United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-6316156 : C2 method-size tuning parameters need update

Submit Date:
Updated Date:
Project Name:
Resolved Date:
Affected Versions:
Targeted Versions:

Related Reports

Sub Tasks

Certain tunable parameters are sensitive to the size of compiled methods.
They need revisiting, since machines are larger than when the parameters
were last tuned (Tiger or before).

In particular, certain newer optimizations (bimorphic inlining) create larger
methods which in turn fall foul of the restrictively turned parameters.

Parameters which may need inflation include:

Vladimir reports that changes optimizing JVM98 run into InlineSmallCode limits below 2200.


Recent discussions about hand-splitting of varargs APIs (via switch) are complicated by this problem.

Ref:  http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-November/036382.html

 static <E> List<E> varargs_switch10(E... ea) {
   switch (ea.length) {
     case 0: return explicit();
     case 1: return explicit(ea[0]);
     case 2: return explicit(ea[0], ea[1]);
     case 3: return explicit(ea[0], ea[1], ea[2]);
     case ...
     default: return varargs(ea);

Suggestion #1:  Support hand-split switches by more accurate measurement of the inline weight of switch-heavy code.  Most switch cases will be unreached, and should not contribute to the inline cost estimate.

Suggestion #2:  If a switch is made on a parameter or a value easily derived from a parameter (ea.length above), and if the value is a compile-time constant, use an inlining cost estimate that takes the constant value into account.  In this case, all but one of the switch-cases would disappear from the inlining cost.

Suggestion #3:  Provide users with more appropriate notations than hand-expanded switches, when hand-splitting is desired, and optimize the simpler notation more simply.
Vitaly Davidovich notes an issue with dense switches that get compiled to multi-way jumps.

A switch should not be measured by the raw size of the instruction.
If several adjacent keys branch to the same successor, surely that should count as a single test and branch.


This problem torments well-intentioned Core Libs programmers who want to sprinkle "asserts" in their short methods.

Although an assert adds exactly zero overhead to native-compiled code, it can kick the bytecode instruction size of a method over the inlining cliff (at 35 bytes, whether executed or not).  This makes asserts appear to have a performance cost, even though they should not.  If we fix this bug in a way where non-executed code is not charged to an inlining heuristic budget, we will stop penalizing programmers for adding asserts.

Untaken paths should not contribute to inline-limiting metrics.
Also: assertions count as code weight, which is very bad.

So instead of using byte code size at all - can't HotSpot generate the inline candidate and throw it away if it's too large or something? Ideally it'd do something like this

do {
   generate ir for hot child
   if child ir too big trivial check
   splice ir into graph 
   apply shrinking transforms
} while (parent ir not too big && enough time left for optimisation)

I've spent significant time tuning Nashorn hot methods to be as small as possible so that they will be considered for inlining. Sometimes, by splitting a method into two, with half the logic in each, I reach my goal - and we should try to think about how to abstract this away from the Java programmer


List of rt. jar methods that are above the byte code inlining limit: https://groups.google.com/forum/#!topic/jitwatch/KJKEgVLTGg8

ILW = {Impact: Med, Likelihood: Med, Workaround: Med}
Impact: Users report sudden unacceptable performance loss.  Difficult to analyze.
Likelihood: Recurring reports.  Probably also underreported.
Workaround: When recognized, split methods into smaller methods.  Time-consuming and error-prone.
Severity = 3
For better visibility into generated code decisions, see https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly and search the web for PrintAssembly.  See also LogCompilation and "jitwatch".
Surprising inline failures are a common problem.

This is a deepening issue for dynamic languages like Nashorn, since they consist of small bits of simple "plumbing" joined together.  If 5% of the joints in the plumbing fail to inline, they can dominate performance.

Such surprises may also crop up in new releases when Java programmers adopts new language features.  For example, this happened when the "assert" keyword was introduced.  Seemingly simple assertions in hot code can disturb inlining and performance, even when assertions are not enabled.  A fix to this, and other fast/slow idiom performance, should discount unreached code, in both inline heuristics and parsing (IR generation,  JDK-8030976).
MaxInlineSize (default = 35 bytecode bytes) is used to detect bytecoded methods which are small enough to to inline almost always.  It should be compared against a better metric than Method::code_size, which is the textual size of the bytecodes.

A better metric would be a weighted instruction count, with low or zero weight given to data movement instructions and heavier weight given to invocations and control transfers.  Unreached instructions (including those never reached so far according to the profile) should not contribute to the weight.  This will allow methods to contain unused slow paths (e.g., for exception throws) that do not interfere with the inlining of fast paths.  A similar metric would be a weighted IR node count (immediately after parsing), but that would more difficult to derive.

InlineSmallCode (default = a few thousand native instruction bytes) is similarly used to detect native-compiled n-methods which are too large to inline.  The problem with it is that the metric is compared against native code bytes, and many of those bytes are "cold" slow paths which are never executed (and thus never hit the instruction cache).  Slow paths (to uncommon traps) tend to be numerous and burdened with trivial data motion code.  Also, this metric is deeply machine-dependent, and so needs complete re-tuning for each CPU architecture and (even worse) for each change in the JIT back end or middle end.

A better metric would be something related to a machine-independent view of the hot path, such as the weighted bytecode instruction count (see above) or IR size.

Background:  Given a strongly interconnected control flow between inline-able methods A->B->C->???->A, the InlineSmallCode limit tends to reduce the number of native copies of A, B, etc.  The pathology it interrupts occurs if an application makes hot entries into the graph at multiple points A, B, C, ???, which (except for InlineSmallCode) would tend to create inlined n-methods containing A->B->C->???->A, B->C->???->A->B, C->???->A->B->C, etc., which triggers compilation work quadratic in the size of the graph cycle, and instruction cache traffic potentially quadratic.


See description.

Hardware and Software, Engineered to Work Together