United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-8068945 : Use RBP register as proper frame pointer in JIT compiled code on x86

Details
Type:
Enhancement
Submit Date:
2015-01-14
Status:
Resolved
Updated Date:
2017-03-20
Project Name:
JDK
Resolved Date:
2015-04-27
Component:
hotspot
OS:
generic
Sub-Component:
compiler
CPU:
x86_64
Priority:
P4
Resolution:
Fixed
Affected Versions:
9
Fixed Versions:

Related Reports
Backport:
Backport:
Backport:
Relates:
Relates:
Relates:
Relates:
Relates:

Sub Tasks
JDK-8169583:

Description
We used rbp register for local values in jit compiled code on x86 because 32-bit mode have only few registers.
We do the same in 64-bit VM but it may help less with performance since it has more registers.
Using RBP only as frame pointer will help debugging and profiling tools.
                                    

Comments
Request for post-FC 8u60 backport for 8068945, 8075798 and 8080281 presented and approved at 5/27 RT mtg.
                                     
2015-05-27
Was performance testing done for these changes?
                                     
2015-05-21
Hi Ilya,


yes, Robert Strout tested performance, please see below:

"Sorry for the delay on getting the performance results... We've had most of
the results for awhile, but one of the SPARC machines has been down which
held up some of the results.  Machine is still down, and not sure when it's
coming back online.  Those impacted results show in the results table as "Failed"...

http://aurora.se.oracle.com/performance/reporting/report/robert.strout.JDK-8068945.Compare

But we can ignore those since we don't need the SPARC results for this anyway.
I can assess the performance results as follows...

    Overall, most of the benchmarks in Aurora.se shown no definitive change.

    There's strong evidence that this benchmark regresses on all x64 platforms:
            SPECjvm2008-Compress*       -2% to -3.5%
           
    There's evidence that this benchmark regresses on all x64 platforms:
            SPECjvm2008-Derby*          -2% to -5%

Normally, I would follow up with a run of derby with more data points to get
a stronger measurement for it. And also do the same for a few weak results just
to make sure something is not hiding there.  But given this change is under a
flag, I don't think we need to take this further.  If you were going to consider
making this the default, then I'd want to run all the SPECjvm2008 benchmarks with
more iterations to better understand any impact it might have on the default.

--Resii"


PreserveFramePointer is off by default, so this change has no impact on VM performance in the default configuration.

Best regards,


Zoltan

                                     
2015-05-21
URL:   http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/382e9e4b3b71
User:  lana
Date:  2015-05-13 21:19:48 +0000

                                     
2015-05-13
Thanks, David, for pointing that out!
                                     
2015-04-30
Fix version was not set when the commit job succeeded. As a result, this issue was not closed automatically. So I close this issue manually with the resolution "Fixed".
                                     
2015-04-27
URL:   http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/382e9e4b3b71
User:  zmajo
Date:  2015-04-27 10:34:10 +0000

                                     
2015-04-27
Hi Robert,


thank you for offering to do the comparison. I'll provide you the builds once the change has been pushed.

Best wishes,


Zolt??n

                                     
2015-04-23
The Perf Team can run a promotion-level comparison prior to integration.  Just provide us with a before and after JPRT build. Let either BrianD and/or I know when it's available.  Thx...
                                     
2015-04-22
[~zmajo], are you working on this?
                                     
2015-03-19
Great, thanks.  Do you know why we have an 8u60 backport for this?
                                     
2015-03-19
Hi [~twisti], yes, I have a working solution. I plan to send out a preliminary RFR next week (I still have to clean up the code and hopefully have looked into how jstack is affected by then).
                                     
2015-03-19
On 14/01/2015 20:12, John Rose wrote:
> On Jan 14, 2015, at 6:42 AM, Bertrand Delsart wrote:
>>
>> I would not prevent the JITs from using RBP as long as the changeset
>> is not sufficient to guarantee the profiling will work... and IMHO
>> solving the JSR292 issue will be much more intrusive (impacting
>> HotSpot stack walking code).
>
> Here are some thoughts on that.
>
> SPARC uses L7 (L7_mh_SP_save) for the same purpose of method handle
> support as x86 uses RBP (rbp_mh_SP_save).  So there's not a hard
> requirement for x86 to take over RBP.
>
> (Deep background:  This purpose, in method handle support, is to allow
> an adapter to make changes to the caller's SP.  The adapter is the
> initial callee from the caller, but may change argument shape, and
> tail-calls the ultimate callee.  Because it is a tail-call, the original
> caller must have a spot where his original SP can be preserved.  The
> preservation works because the original caller knows he is calling a
> MH.invoke method, which requires the extra argument preservation.  The
> repertoire of argument shape changes is quite small, actually; it is not
> a very general mechanism since the LF machinery was put in.  Perhaps the
> whole thing could be removed somehow, by finding alternative techniques
> for the few remaining changes.  OTOH, this SP-restoring mechanism may be
> helpful in doing more a general tail-call mechanism, and perhaps in
> managing int/comp mode changes more cleanly, so I'd like us to keep it.
>   And document it better.)
>
> Any register or stack slot will do for this purpose, as long as (i) its
> value can be recovered after the MH.invoke call returns to the caller,
> and (ii) its value can be dug up somehow during stack walking.  There
> are only a couple of places where stack walking code needs to sample the
> value, so they should be adjustable.
>
> Both x86 and SPARC use registers which are callee-save (or "non-volatile
> across calls") which satisfy properties (i) and (ii).  A standard stack
> slot (addressed based on caller's RBP) would probably also satisfy those
> properties.
>
> A variably-positioned stack slot would also work, which would require
> registering the position in each CodeBlob.  That's unpleasant extra
> detail, but it would align somewhat with the current logic which allows
> each CodeBlob (nmethod, actually) to advertise which call sites need the
> special processing (see the function is_method_handle_return(caller_pc)).
>
> I recommend reserving a dead word of space in every stack frame that
> makes MH.invoke calls, at a fixed position relative to that frame's RBP.
>
> ??? John

I perfectly agree that it is doable (and with your proposed approach).

I just wanted to be sure people were aware that the RFE is more complex than what the current changeset may suggest. We are not just taking about reviewing and integrating a complete changeset contributed by the community. There is more work needed, either by the community or by Oracle. This will require changes at least in C1 and C2 call sequences, in the stack walking, in the creation and sizing of compiled frames...

Regards,

Bertrand. 
                                     
2015-01-15
On Thu, Dec 4, 2014 at 11:55 PM, Brendan Gregg wrote:
G'Day,

I've hacked hotspot to return the frame pointer, in part to see what this
involves, and also to have a working prototype for analysis. Along with an
agent to resolve symbols, this has allowed full stack profiling using Linux
perf_events.

I'd love to be able to enable frame pointers in Oracle JDK, eg, with an
-XX:+NoOmitFramePointer option. It could be put under
-XX:+UnlockDiagnosticVMOptions or XX:+UnlockExperimentalVMOptions. So long
as we had some way to turn it on. If someone wants to include (improve,
rewrite) my patch, please do.

See full discussion on mailing list:

http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-December/016477.html

                                     
2015-01-14
Recently we received request with proposed patch from Brendan Gregg:

In case there's problems with the patch URL, the patch is:

--- openjdk8clean/hotspot/src/cpu/x86/vm/x86_64.ad    2014-03-04 02:52:11.000000000 +0000
+++ openjdk8/hotspot/src/cpu/x86/vm/x86_64.ad    2014-11-08 01:10:49.686044933 +0000
@@ -166,10 +166,9 @@
 // 3) reg_class stack_slots( /* one chunk of stack-based "registers" */ )
 //
 
-// Class for all pointer registers (including RSP)
+// Class for all pointer registers (including RSP, excluding RBP)
 reg_class any_reg(RAX, RAX_H,
                   RDX, RDX_H,
-                  RBP, RBP_H,
                   RDI, RDI_H,
                   RSI, RSI_H,
                   RCX, RCX_H,
@@ -184,10 +183,9 @@
                   R14, R14_H,
                   R15, R15_H);
 
-// Class for all pointer registers except RSP
+// Class for all pointer registers except RSP and RBP
 reg_class ptr_reg(RAX, RAX_H,
                   RDX, RDX_H,
-                  RBP, RBP_H,
                   RDI, RDI_H,
                   RSI, RSI_H,
                   RCX, RCX_H,
@@ -199,9 +197,8 @@
                   R13, R13_H,
                   R14, R14_H);
 
-// Class for all pointer registers except RAX and RSP
+// Class for all pointer registers except RAX, RSP and RBP
 reg_class ptr_no_rax_reg(RDX, RDX_H,
-                         RBP, RBP_H,
                          RDI, RDI_H,
                          RSI, RSI_H,
                          RCX, RCX_H,
@@ -226,9 +223,8 @@
                          R13, R13_H,
                          R14, R14_H);
 
-// Class for all pointer registers except RAX, RBX and RSP
+// Class for all pointer registers except RAX, RBX, RSP and RBP
 reg_class ptr_no_rax_rbx_reg(RDX, RDX_H,
-                             RBP, RBP_H,
                              RDI, RDI_H,
                              RSI, RSI_H,
                              RCX, RCX_H,
@@ -260,10 +256,9 @@
 // Singleton class for TLS pointer
 reg_class ptr_r15_reg(R15, R15_H);
 
-// Class for all long registers (except RSP)
+// Class for all long registers (except RSP and RBP)
 reg_class long_reg(RAX, RAX_H,
                    RDX, RDX_H,
-                   RBP, RBP_H,
                    RDI, RDI_H,
                    RSI, RSI_H,
                    RCX, RCX_H,
@@ -275,9 +270,8 @@
                    R13, R13_H,
                    R14, R14_H);
 
-// Class for all long registers except RAX, RDX (and RSP)
-reg_class long_no_rax_rdx_reg(RBP, RBP_H,
-                              RDI, RDI_H,
+// Class for all long registers except RAX, RDX (and RSP, RBP)
+reg_class long_no_rax_rdx_reg(RDI, RDI_H,
                               RSI, RSI_H,
                               RCX, RCX_H,
                               RBX, RBX_H,
@@ -288,9 +282,8 @@
                               R13, R13_H,
                               R14, R14_H);
 
-// Class for all long registers except RCX (and RSP)
-reg_class long_no_rcx_reg(RBP, RBP_H,
-                          RDI, RDI_H,
+// Class for all long registers except RCX (and RSP, RBP)
+reg_class long_no_rcx_reg(RDI, RDI_H,
                           RSI, RSI_H,
                           RAX, RAX_H,
                           RDX, RDX_H,
@@ -302,9 +295,8 @@
                           R13, R13_H,
                           R14, R14_H);
 
-// Class for all long registers except RAX (and RSP)
-reg_class long_no_rax_reg(RBP, RBP_H,
-                          RDX, RDX_H,
+// Class for all long registers except RAX (and RSP, RBP)
+reg_class long_no_rax_reg(RDX, RDX_H,
                           RDI, RDI_H,
                           RSI, RSI_H,
                           RCX, RCX_H,
@@ -325,10 +317,9 @@
 // Singleton class for RDX long register
 reg_class long_rdx_reg(RDX, RDX_H);
 
-// Class for all int registers (except RSP)
+// Class for all int registers (except RSP and RBP)
 reg_class int_reg(RAX,
                   RDX,
-                  RBP,
                   RDI,
                   RSI,
                   RCX,
@@ -340,10 +331,9 @@
                   R13,
                   R14);
 
-// Class for all int registers except RCX (and RSP)
+// Class for all int registers except RCX (and RSP, RBP)
 reg_class int_no_rcx_reg(RAX,
                          RDX,
-                         RBP,
                          RDI,
                          RSI,
                          RBX,
@@ -355,8 +345,7 @@
                          R14);
 
 // Class for all int registers except RAX, RDX (and RSP)
-reg_class int_no_rax_rdx_reg(RBP,
-                             RDI,
+reg_class int_no_rax_rdx_reg(RDI,
                              RSI,
                              RCX,
                              RBX,
@@ -718,6 +707,7 @@
     st->print("# stack bang");
     st->print("\n\t");
     st->print("pushq   rbp\t# Save rbp");
+    // BDG consider: st->print("movq    rbp, rsp\t# ");
     if (framesize) {
       st->print("\n\t");
       st->print("subq    rsp, #%d\t# Create frame",framesize);
--- openjdk8clean/hotspot/src/cpu/x86/vm/macroAssembler_x86.cpp    2014-03-04 02:52:11.000000000 +0000
+++ openjdk8/hotspot/src/cpu/x86/vm/macroAssembler_x86.cpp    2014-11-07 23:57:11.589593723 +0000
@@ -5236,6 +5236,7 @@
     // We always push rbp, so that on return to interpreter rbp, will be
     // restored correctly and we can correct the stack.
     push(rbp);
+    mov(rbp, rsp);
     // Remove word for ebp
     framesize -= wordSize;
 
--- openjdk8clean/hotspot/src/cpu/x86/vm/c1_MacroAssembler_x86.cpp    2014-03-04 02:52:10.000000000 +0000
+++ openjdk8/hotspot/src/cpu/x86/vm/c1_MacroAssembler_x86.cpp    2014-11-07 23:57:21.933257882 +0000
@@ -358,6 +358,7 @@
   generate_stack_overflow_check(frame_size_in_bytes);
 
   push(rbp);
+  mov(rbp, rsp);
 #ifdef TIERED
   // c2 leaves fpu stack dirty. Clean it on entry
   if (UseSSE < 2 ) {

                                     
2015-01-14



Hardware and Software, Engineered to Work Together