JDK-8076988 : reevaluate trivial method policy
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 8,9,10,11
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2015-04-05
  • Updated: 2019-10-04
  • Resolved: 2019-01-08
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 13
11.0.5Fixed 13 b03Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Description
This issue affects performance research. This is a minimized version of what happens in JMH-driven code. See the test code here:
  http://cr.openjdk.java.net/~shade/8076988/TestCall.java

Both m() and m1() are very hot, we want to make sure both are compiled with C2. At the same time, we want to split the generated code, and forbid the inlining of m1() into m(). This can be done either via CompilerCommand, or via @DontInline annotation -- or some other way to tell CompilerOracle.

Without tiered, m1() gets compiled just fine:

$ java -XX:+PrintCompilation -XX:-TieredCompilation -XX:CompileCommand=dontinline,TestCall::m1 TestCall
     34    1             TestCall::m1 (1 bytes)      <---- compiled with C2
     34    2 %           TestCall::m @ 2 (18 bytes)
     35    2 %           TestCall::m @ -2 (18 bytes)   made not entrant
     35    3             TestCall::m (18 bytes)
     35    4 %           TestCall::m @ 2 (18 bytes)
   2229    5 %           TestCall::main @ 2 (18 bytes)
  15088    5 %           TestCall::main @ -2 (18 bytes)   made not entrant

With tiered, however, m1() seems to be stuck at level=1, i.e. C1:

$ java -XX:+PrintCompilation -XX:+TieredCompilation -XX:CompileCommand=dontinline,TestCall::m1 TestCall
...
     34   13       3       TestCall::m1 (1 bytes)   <--- C1 with full profiling
     34   14       1       TestCall::m1 (1 bytes)   <--- oops, back to C1
     34   13       3       TestCall::m1 (1 bytes)   made not entrant
...

This is additionally verified by observing the generated code in such the methods, and seeing the C1-quality code there, while C2 code is expected. It is important to understand that the performance issue manifests when the inline of m1() is forbidden. Otherwise, the hot path would lie through the m() that inlined m1(), and the issue would burrow itself.
Comments
Fix Request (11u) This patch makes more methods compiled with C2, rather than with C1, and so improves performance in corner cases. The patch applies cleanly to 11u after JDK-8209186 backport, passes tier1 tests.
26-06-2019

[~dlong], sounds good to me. Re-opening.
04-12-2017

[~thartmann] I think we should do a quick fix for 10 with 8145579, and reopen this for 11 to address more risky changes to is_trivial.
04-12-2017

I'm closing this as duplicate of JDK-8145579 because the problem boils down to the is_trivial heuristic that should be revisited.
30-01-2017

Triage: Changing to RFE - This is worth evaluating.
13-04-2015

The original issue was about UseCondCardMark, and m1() was having only a single store, no calls. I think the heuristics is wrong because it relies on unstable invariant that, to quote the source code, "// Simple methods are as good being compiled with C1 as C2." UseCondCardMark is a counter-example for that. Instead of special-casing the heuristics to work the UseCondCardMark-like issues around (granted, C1 should just implement it), I would think we need to reconsider the very existence of the heuristics that introduces hard-to-diagnose performance anomalies. It was a pure luck we had time to follow up and were able to reproduce it reliably to point fingers at the offending code. Other users will just throw their hands in the air, and disable tiered compilation. So the question is: do we *actually* know this heuristics even helps? Do we know the exact reason why it exists? Is this heuristics a palliative for some shortcoming in tiered (e.g. overloaded queues)? It feels like the workaround for not having a prioritized queues in tiered -- submit to C1 queue that is arguably less occupied under the premise of generating the same code -- in other words, using a fragile premise. Queueing theory says that if you want to minimize average time spent in queue -- you should schedule the smallest jobs first.
09-04-2015

Also, during the is_trivial check the mdo->would_profile() is checked, and it should be set during profiling in C1 for a method that invokes another one: http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/7ac058b59e10/src/share/vm/c1/c1_GraphBuilder.cpp#l2034 So I think, that heuristics check if there are any calls inside.
09-04-2015

Aleksey, why do you think that heuristics are wrong? You're isolating an empty method, and not a method that has a single call. If I add an invocation of an empty method m2 to the m1 in your example, then m1 being itself not inlined will be compiled with the 4th level (C2): $ java -XX:+TieredCompilation -XX:CompileCommand=dontinline,*.m1 -XX:+PrintCompilation -XX:+PrintInlining TestCall | grep TestCall 235 47 2 TestCall::m1 (4 bytes) @ 0 TestCall::m2 (1 bytes) 235 48 2 TestCall::m2 (1 bytes) 235 49 4 TestCall::m1 (4 bytes) // Compiled with C2, if it has a single method call 236 47 2 TestCall::m1 (4 bytes) made not entrant @ 0 TestCall::m2 (1 bytes) inline (hot) // here is a method, that was inlined into m1 236 50 % 4 TestCall::m @ 2 (18 bytes) @ 8 TestCall::m1 (4 bytes) disallowed by CompilerOracle 237 51 4 TestCall::m (18 bytes) @ 8 TestCall::m1 (4 bytes) disallowed by CompilerOracle 238 51 4 TestCall::m (18 bytes) made not entrant 239 52 3 TestCall::m (18 bytes) @ 8 TestCall::m1 (4 bytes) disallowed by CompilerOracle 241 53 4 TestCall::m (18 bytes) 242 52 3 TestCall::m (18 bytes) made not entrant @ 8 TestCall::m1 (4 bytes) disallowed by CompilerOracle 11773 54 % 3 TestCall::main @ 2 (18 bytes) @ 8 TestCall::m (18 bytes) inlining prohibited by policy 11968 55 3 TestCall::main (18 bytes) @ 8 TestCall::m (18 bytes) inlining prohibited by policy
09-04-2015

Indeed, the heuristic should probably check for calls.
08-04-2015

If we do allow inlining, such methods will be probably inlined and optimized. Otherwise we compile them normal way as a method. Such heuristics should decrease the C2 compiler queue load. If we would't have it, all methods would have been put into C2 compiler queue after profiling, while C1 queue would be empty. Assuming that C1 will make almost the same compiled code but faster, this decreases average amount of time each method spends in queue waiting for compilation.
08-04-2015

Ah, that probably explains it, thanks Pavel. I submit that heuristics is wrong then: when you are isolating a "trivial" method that has a single method call that is then expanded by C2 intrinsic and/or uses some other C2-specific optimization like UseCondCardMark, this heuristics gets in the way. It's not clear to me why that heuristics exists anyway -- do we think C2 compile of trivial method would take significantly more time?
08-04-2015

There are different workarounds: 1. -XX:-TieredCompilation obviously 2. Use WhiteBox::makeMethodNotCompilable(Executable method, int compLevel) with compLevel=1
06-04-2015

Executed TestCall to find out what's happening: $ java -XX:+TieredCompilation -XX:CompileCommand=dontinline,*.m1 -XX:+PrintCompilation -XX:+PrintInlining TestCall | grep TestCall 235 49 % 4 TestCall::m @ 2 (18 bytes) @ 8 TestCall::m1 (1 bytes) disallowed by CompilerOracle // OSR in for loop, but m1 was disallowed to be inlined 239 50 4 TestCall::m (18 bytes) @ 8 TestCall::m1 (1 bytes) disallowed by CompilerOracle // Normal compilation was submitted too with no inlining again. 243 48 2 TestCall::m1 (1 bytes) // m1 was submitted for profiling 243 50 4 TestCall::m (18 bytes) made not entrant 243 51 1 TestCall::m1 (1 bytes) // Was compiled with C1 because it's trivial method 243 48 2 TestCall::m1 (1 bytes) made not entrant 246 52 3 TestCall::m (18 bytes) As soon as m1 was dissallowed to be inlined, it was submitted to be normally compiled. But it's a trivial method (code_size < 5 (Magic const :) ) See http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/0a369507f96d/src/share/vm/runtime/simpleThresholdPolicy.inline.hpp#l66
06-04-2015