JDK-8130398 : Coalesce prolog and epilog for empty methods
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 9,10
  • Priority: P4
  • Status: Closed
  • Resolution: Won't Fix
  • Submitted: 2015-07-03
  • Updated: 2021-02-02
  • Resolved: 2021-02-02
Related Reports
Relates :  
Relates :  
Relates :  
Description
In low-level benchmarking, we sometimes resort to non-inlineable "sink" methods to escape dead-code elimination, like this:

    @Benchmark
    public void test() {
        doNothing(obj);
    }

    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public void doNothing(Object obj) {
        // deliberately do nothing
    }

The performance of this method is very important, since we usually deal with nanosecond-scale benchmarks. Ideally, the generated code should contain a "ret" right away.
However, the generated code for doNothing contains prolog followed immediately with epilog:

                  [Verified Entry Point]
 10.93%    6.25%    0x00007f39f415fd80: mov    %eax,-0x14000(%rsp)
  3.76%    3.03%    0x00007f39f415fd87: push   %rbp
  1.92%    1.97%    0x00007f39f415fd88: sub    $0x30,%rsp
 10.42%   10.64%    0x00007f39f415fd8c: add    $0x30,%rsp
  2.88%    3.03%    0x00007f39f415fd90: pop    %rbp
 25.45%   31.68%    0x00007f39f415fd91: test   %eax,0x15df8369(%rip)        # 0x00007f3a09f58100
                                                                  ;   {poll_return}
  0.57%    0.47%    0x00007f39f415fd97: retq   

It seems that at least RSP operations are redundant, as well as saving/restoring RBP.
It would be interesting to see if we can remove these redundant ops, e.g.:

 *) Peephole MachPrologNode -> MachEpilogNode out completely;
 *) Macro-expand MachProlog/EpilogNode into the individual ops, and then peephole (sub $const, %reg) -> (add $const, %reg) and (push %reg) -> (pop %reg);
 *) Massage frame_size_in_bytes() so that it is zero for empty method;

Benchmark:
 http://cr.openjdk.java.net/~shade/8130398/EmptyMethod.java

Runnable JAR:
 http://cr.openjdk.java.net/~shade/8130398/benchmarks.jar

Output and disasssembly:
 http://cr.openjdk.java.net/~shade/8130398/perfasm.out
Comments
Thank you, Aleksey.
02-02-2021

Looking at history and links, there are three cases I think we wanted to handle: - Reference.reachabilityFence -- now moot because JDK-8199462 ForceInline-s it - Benchmarking sinks -- going to be moot after we get Blackholes in JDK-8259316 - VH.{get|set}Opaque -- already intrinsified in initial integration in JDK-8148146 So, doing anything non-trivial is not going to have good cost/benefit balance. I am okay with WNF-ing it.
02-02-2021

[~shade] Is this issue still important? I looked and we can only do that for empty methods. For accessors it will not work because we can deoptimize on holder's NULL check. The solution is not to save RBP (saving makes it SOE). After removing RBP, saving stack also is not updated - it was updated due to additional RBP slot and stack alignment to 16 bytes. Changes touch several non-trivial places and I am not sure if I should do it.
02-02-2021

Thanks, Aleksey!
30-01-2017

Indeed, the assembly above is created by C1. Re-checked with 9b152, and updated JMH that prints out the compiler level, and there is a similar issue with C2: http://cr.openjdk.java.net/~shade/8130398/c1.perfasm http://cr.openjdk.java.net/~shade/8130398/c2.perfasm
30-01-2017

Could this be because the empty method is considered as trivial and only compiled by C1 (see JDK-8145579)?
30-01-2017

This issue become more important as the go-to technique for implementing Fences.reachabilityFence and/or VarHandle.(get|set)Opaque in OpenJDK.
24-08-2015