Bug ID: JDK-8178287 AsyncGetCallTrace fails to traverse valid Java stacks

Type: Bug
Component: hotspot
Sub-Component: svc
Affected Version: 8u121,9,10
Priority: P4
Status: Open
Resolution: Unresolved
CPU: x86_64
Submitted: 2017-04-07
Updated: 2021-06-29
Other
tbdUnresolved
There is a number of cases (observed in real applications) when profiling with AsyncGetCallTrace is useless due to HotSpot inability to walk through Java stack. Here is the analysis of such cases.

#1. An application performs many System.arraycopy() and spends a lot of time inside generated arraycopy stubs.

public class Prof1 {

    public static void main(String[] args) {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < 1000000; i++) {
            sb.append("ab");
            sb.delete(0, 1);
        }
        System.out.println(sb.length());
    }
}

--- Collected profile ---
Total samples:         500
ticks_unknown_Java:    499 (99,80%)

Samples: 1 (0,20%)
  [ 0] java.lang.AbstractStringBuilder.ensureCapacityInternal
  [ 1] java.lang.AbstractStringBuilder.append
  [ 2] java.lang.StringBuilder.append
  [ 3] Prof1.main

Here AsyncGetCallTrace fails with error = -5 (ticks_unknown_Java) almost in 100% cases. The reason is that frame::safe_for_sender() does not check is_stub_frame() and therefore returns negative result after "if (_cb->frame_size() <= 0)" check. Such stacks could be perfectly traversed if stub frames were treated specially.

--- Expected profile ---
Total samples:         499

Samples: 499 (100,00%)
  [ 0] jshort_disjoint_arraycopy
  [ 1] java.lang.AbstractStringBuilder.delete
  [ 2] java.lang.StringBuilder.delete
  [ 3] Prof1.main


#2. Another case is a method with a megamorphic interface call.

public class Prof2 {

    public static void main(String[] args) {
        Supplier[] suppliers = {
                () -> 0,
                () -> 1.0,
                () -> "abc",
                () -> true
        };

        for (int i = 0; i >= 0; i++) {
            suppliers[i % suppliers.length].get();
        }
    }
}

--- Collected profile ---
Total samples:         494
ticks_unknown_Java:    331 (67,00%)

Samples: 128 (25,91%)
  [ 0] Prof2.main

Samples: 26 (5,26%)
  [ 0] java.lang.Double.valueOf
  [ 1] Prof2.lambda$main$1
  [ 2] Prof2$$Lambda$2.531885035.get
  [ 3] Prof2.main

Samples: 4 (0,81%)
  [ 0] Prof2$$Lambda$4.303563356.get
  [ 1] Prof2.main

Samples: 3 (0,61%)
  [ 0] Prof2$$Lambda$3.1418481495.get
  [ 1] Prof2.main

In this example AsyncGetCallTrace fails to recognize 2/3 samples. Many of these samples occur during invocation of Supplier.get(). Since the call site is megamorphic, the method is invoked via itable stub. Just like in the first case, frame::safe_for_sender() fails here on "if (_cb->frame_size() <= 0)" check. VtableStub creates a BufferBlob with zero frame_size, so it is not walkable by AsyncGetCallTrace. However, if we explicitly set the correct frame size of VtableStub, it will become walkable. For instance, on x64 frame_size == 1, since only the return address is saved on the stack.


#3. Even after VtableStub fix, the profilers based on AsyncGetCallTrace will still fail on the above code in 30% cases.

Note that Suppliers are very simple, their compiled method bodies are short. However, each compiled method has the prologue and the epilogue like this:

    mov    %eax,-0x14000(%rsp)
    push   %rbp
    sub    $0x30,%rsp
    
    [method body]
    
    add    $0x30,%rsp
    pop    %rbp
    test   %eax,0x1234567(%rip)
    retq

For the short methods sampling profilers often hit inside the prologue or the epilogue, while the stack frame is not fully constructed. Such inconsistent frames can be divided into two types:
 1) They have only a return address on the stack; that is, sender_pc = *(sp)
 2) They have a return address and fp on the stack; that is, sender_pc = *(sp+1); sender_fp = *(sp)
Stack-walking code could recognize these two patterns and easily find the previous frame.

--- Expected profile ---
Total:                 494

Samples: 227 (45,95%)
  [ 0] itable stub
  [ 1] Prof2.main

Samples: 130 (26,32%)
  [ 0] Prof2.main

Samples: 32 (6,48%)
  [ 0] Prof2$$Lambda$1.791452441.get
  [ 1] Prof2.main

Samples: 31 (6,28%)
  [ 0] Prof2$$Lambda$3.1418481495.get
  [ 1] Prof2.main

Samples: 26 (5,26%)
  [ 0] java.lang.Double.valueOf
  [ 1] Prof2.lambda$main$1
  [ 2] Prof2$$Lambda$2.531885035.get
  [ 3] Prof2.main

Samples: 25 (5,06%)
  [ 0] Prof2$$Lambda$4.303563356.get
  [ 1] Prof2.main

Samples: 13 (2,63%)
  [ 0] Prof2$$Lambda$2.531885035.get
  [ 1] Prof2.main

Samples: 6 (1,21%)
  [ 0] java.lang.Integer.valueOf
  [ 1] Prof2.lambda$main$0
  [ 2] Prof2$$Lambda$1.791452441.get
  [ 3] Prof2.main


As a proof-of-concept I've implemented a workaround for the above cases in 'async-profiler' project, and the percentage of ticks_unknown_Java has become negligible. However, this is not a problem of the particular profiler, but rather a general bug in AsyncGetCallTrace mechanism. For example, a well-known 'honest-profiler' project also suffers from the same bug. So, the right way to solve the problem would be to fix AsyncGetCallTrace.
Relates :	JDK-8252417 - JFR Execution Sample event fails to traverse valid Java stacks
Relates :	JDK-8153167 - Allow stack walking pass through method handle intrinsic frames