JDK-8068945 : Use RBP register as proper frame pointer in JIT compiled code on x86
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 9
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: x86_64
  • Submitted: 2015-01-14
  • Updated: 2017-03-20
  • Resolved: 2015-04-27
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8 JDK 9
8u60Fixed 9 b64Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8169583 :  
Description
We used rbp register for local values in jit compiled code on x86 because 32-bit mode have only few registers.
We do the same in 64-bit VM but it may help less with performance since it has more registers.
Using RBP only as frame pointer will help debugging and profiling tools.
Comments
Request for post-FC 8u60 backport for 8068945, 8075798 and 8080281 presented and approved at 5/27 RT mtg.
27-05-2015

Hi Ilya, yes, Robert Strout tested performance, please see below: "Sorry for the delay on getting the performance results... We've had most of the results for awhile, but one of the SPARC machines has been down which held up some of the results. Machine is still down, and not sure when it's coming back online. Those impacted results show in the results table as "Failed"... http://aurora.se.oracle.com/performance/reporting/report/robert.strout.JDK-8068945.Compare But we can ignore those since we don't need the SPARC results for this anyway. I can assess the performance results as follows... Overall, most of the benchmarks in Aurora.se shown no definitive change. There's strong evidence that this benchmark regresses on all x64 platforms: SPECjvm2008-Compress* -2% to -3.5% There's evidence that this benchmark regresses on all x64 platforms: SPECjvm2008-Derby* -2% to -5% Normally, I would follow up with a run of derby with more data points to get a stronger measurement for it. And also do the same for a few weak results just to make sure something is not hiding there. But given this change is under a flag, I don't think we need to take this further. If you were going to consider making this the default, then I'd want to run all the SPECjvm2008 benchmarks with more iterations to better understand any impact it might have on the default. --Resii" PreserveFramePointer is off by default, so this change has no impact on VM performance in the default configuration. Best regards, Zoltan
21-05-2015

Was performance testing done for these changes?
21-05-2015

Thanks, David, for pointing that out!
30-04-2015

Fix version was not set when the commit job succeeded. As a result, this issue was not closed automatically. So I close this issue manually with the resolution "Fixed".
27-04-2015

Hi Robert, thank you for offering to do the comparison. I'll provide you the builds once the change has been pushed. Best wishes, Zolt��n
23-04-2015

The Perf Team can run a promotion-level comparison prior to integration. Just provide us with a before and after JPRT build. Let either BrianD and/or I know when it's available. Thx...
22-04-2015

Great, thanks. Do you know why we have an 8u60 backport for this?
19-03-2015

Hi [~twisti], yes, I have a working solution. I plan to send out a preliminary RFR next week (I still have to clean up the code and hopefully have looked into how jstack is affected by then).
19-03-2015

[~zmajo], are you working on this?
19-03-2015

On 14/01/2015 20:12, John Rose wrote: > On Jan 14, 2015, at 6:42 AM, Bertrand Delsart wrote: >> >> I would not prevent the JITs from using RBP as long as the changeset >> is not sufficient to guarantee the profiling will work... and IMHO >> solving the JSR292 issue will be much more intrusive (impacting >> HotSpot stack walking code). > > Here are some thoughts on that. > > SPARC uses L7 (L7_mh_SP_save) for the same purpose of method handle > support as x86 uses RBP (rbp_mh_SP_save). So there's not a hard > requirement for x86 to take over RBP. > > (Deep background: This purpose, in method handle support, is to allow > an adapter to make changes to the caller's SP. The adapter is the > initial callee from the caller, but may change argument shape, and > tail-calls the ultimate callee. Because it is a tail-call, the original > caller must have a spot where his original SP can be preserved. The > preservation works because the original caller knows he is calling a > MH.invoke method, which requires the extra argument preservation. The > repertoire of argument shape changes is quite small, actually; it is not > a very general mechanism since the LF machinery was put in. Perhaps the > whole thing could be removed somehow, by finding alternative techniques > for the few remaining changes. OTOH, this SP-restoring mechanism may be > helpful in doing more a general tail-call mechanism, and perhaps in > managing int/comp mode changes more cleanly, so I'd like us to keep it. > And document it better.) > > Any register or stack slot will do for this purpose, as long as (i) its > value can be recovered after the MH.invoke call returns to the caller, > and (ii) its value can be dug up somehow during stack walking. There > are only a couple of places where stack walking code needs to sample the > value, so they should be adjustable. > > Both x86 and SPARC use registers which are callee-save (or "non-volatile > across calls") which satisfy properties (i) and (ii). A standard stack > slot (addressed based on caller's RBP) would probably also satisfy those > properties. > > A variably-positioned stack slot would also work, which would require > registering the position in each CodeBlob. That's unpleasant extra > detail, but it would align somewhat with the current logic which allows > each CodeBlob (nmethod, actually) to advertise which call sites need the > special processing (see the function is_method_handle_return(caller_pc)). > > I recommend reserving a dead word of space in every stack frame that > makes MH.invoke calls, at a fixed position relative to that frame's RBP. > > ��� John I perfectly agree that it is doable (and with your proposed approach). I just wanted to be sure people were aware that the RFE is more complex than what the current changeset may suggest. We are not just taking about reviewing and integrating a complete changeset contributed by the community. There is more work needed, either by the community or by Oracle. This will require changes at least in C1 and C2 call sequences, in the stack walking, in the creation and sizing of compiled frames... Regards, Bertrand.
15-01-2015

On Thu, Dec 4, 2014 at 11:55 PM, Brendan Gregg wrote: G'Day, I've hacked hotspot to return the frame pointer, in part to see what this involves, and also to have a working prototype for analysis. Along with an agent to resolve symbols, this has allowed full stack profiling using Linux perf_events. I'd love to be able to enable frame pointers in Oracle JDK, eg, with an -XX:+NoOmitFramePointer option. It could be put under -XX:+UnlockDiagnosticVMOptions or XX:+UnlockExperimentalVMOptions. So long as we had some way to turn it on. If someone wants to include (improve, rewrite) my patch, please do. See full discussion on mailing list: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-December/016477.html
14-01-2015

Recently we received request with proposed patch from Brendan Gregg: In case there's problems with the patch URL, the patch is: --- openjdk8clean/hotspot/src/cpu/x86/vm/x86_64.ad 2014-03-04 02:52:11.000000000 +0000 +++ openjdk8/hotspot/src/cpu/x86/vm/x86_64.ad 2014-11-08 01:10:49.686044933 +0000 @@ -166,10 +166,9 @@ // 3) reg_class stack_slots( /* one chunk of stack-based "registers" */ ) // -// Class for all pointer registers (including RSP) +// Class for all pointer registers (including RSP, excluding RBP) reg_class any_reg(RAX, RAX_H, RDX, RDX_H, - RBP, RBP_H, RDI, RDI_H, RSI, RSI_H, RCX, RCX_H, @@ -184,10 +183,9 @@ R14, R14_H, R15, R15_H); -// Class for all pointer registers except RSP +// Class for all pointer registers except RSP and RBP reg_class ptr_reg(RAX, RAX_H, RDX, RDX_H, - RBP, RBP_H, RDI, RDI_H, RSI, RSI_H, RCX, RCX_H, @@ -199,9 +197,8 @@ R13, R13_H, R14, R14_H); -// Class for all pointer registers except RAX and RSP +// Class for all pointer registers except RAX, RSP and RBP reg_class ptr_no_rax_reg(RDX, RDX_H, - RBP, RBP_H, RDI, RDI_H, RSI, RSI_H, RCX, RCX_H, @@ -226,9 +223,8 @@ R13, R13_H, R14, R14_H); -// Class for all pointer registers except RAX, RBX and RSP +// Class for all pointer registers except RAX, RBX, RSP and RBP reg_class ptr_no_rax_rbx_reg(RDX, RDX_H, - RBP, RBP_H, RDI, RDI_H, RSI, RSI_H, RCX, RCX_H, @@ -260,10 +256,9 @@ // Singleton class for TLS pointer reg_class ptr_r15_reg(R15, R15_H); -// Class for all long registers (except RSP) +// Class for all long registers (except RSP and RBP) reg_class long_reg(RAX, RAX_H, RDX, RDX_H, - RBP, RBP_H, RDI, RDI_H, RSI, RSI_H, RCX, RCX_H, @@ -275,9 +270,8 @@ R13, R13_H, R14, R14_H); -// Class for all long registers except RAX, RDX (and RSP) -reg_class long_no_rax_rdx_reg(RBP, RBP_H, - RDI, RDI_H, +// Class for all long registers except RAX, RDX (and RSP, RBP) +reg_class long_no_rax_rdx_reg(RDI, RDI_H, RSI, RSI_H, RCX, RCX_H, RBX, RBX_H, @@ -288,9 +282,8 @@ R13, R13_H, R14, R14_H); -// Class for all long registers except RCX (and RSP) -reg_class long_no_rcx_reg(RBP, RBP_H, - RDI, RDI_H, +// Class for all long registers except RCX (and RSP, RBP) +reg_class long_no_rcx_reg(RDI, RDI_H, RSI, RSI_H, RAX, RAX_H, RDX, RDX_H, @@ -302,9 +295,8 @@ R13, R13_H, R14, R14_H); -// Class for all long registers except RAX (and RSP) -reg_class long_no_rax_reg(RBP, RBP_H, - RDX, RDX_H, +// Class for all long registers except RAX (and RSP, RBP) +reg_class long_no_rax_reg(RDX, RDX_H, RDI, RDI_H, RSI, RSI_H, RCX, RCX_H, @@ -325,10 +317,9 @@ // Singleton class for RDX long register reg_class long_rdx_reg(RDX, RDX_H); -// Class for all int registers (except RSP) +// Class for all int registers (except RSP and RBP) reg_class int_reg(RAX, RDX, - RBP, RDI, RSI, RCX, @@ -340,10 +331,9 @@ R13, R14); -// Class for all int registers except RCX (and RSP) +// Class for all int registers except RCX (and RSP, RBP) reg_class int_no_rcx_reg(RAX, RDX, - RBP, RDI, RSI, RBX, @@ -355,8 +345,7 @@ R14); // Class for all int registers except RAX, RDX (and RSP) -reg_class int_no_rax_rdx_reg(RBP, - RDI, +reg_class int_no_rax_rdx_reg(RDI, RSI, RCX, RBX, @@ -718,6 +707,7 @@ st->print("# stack bang"); st->print("\n\t"); st->print("pushq rbp\t# Save rbp"); + // BDG consider: st->print("movq rbp, rsp\t# "); if (framesize) { st->print("\n\t"); st->print("subq rsp, #%d\t# Create frame",framesize); --- openjdk8clean/hotspot/src/cpu/x86/vm/macroAssembler_x86.cpp 2014-03-04 02:52:11.000000000 +0000 +++ openjdk8/hotspot/src/cpu/x86/vm/macroAssembler_x86.cpp 2014-11-07 23:57:11.589593723 +0000 @@ -5236,6 +5236,7 @@ // We always push rbp, so that on return to interpreter rbp, will be // restored correctly and we can correct the stack. push(rbp); + mov(rbp, rsp); // Remove word for ebp framesize -= wordSize; --- openjdk8clean/hotspot/src/cpu/x86/vm/c1_MacroAssembler_x86.cpp 2014-03-04 02:52:10.000000000 +0000 +++ openjdk8/hotspot/src/cpu/x86/vm/c1_MacroAssembler_x86.cpp 2014-11-07 23:57:21.933257882 +0000 @@ -358,6 +358,7 @@ generate_stack_overflow_check(frame_size_in_bytes); push(rbp); + mov(rbp, rsp); #ifdef TIERED // c2 leaves fpu stack dirty. Clean it on entry if (UseSSE < 2 ) {
14-01-2015