JDK-8023597 : Optimize G1 barriers code for unsafe load_store
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 8-pool
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2013-08-22
  • Updated: 2013-10-18
  • Resolved: 2013-08-29
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8 Other
8Fixed hs25Fixed
Description
On Aug 22, 2013, at 6:34 AM, "Doerr, Martin" <martin.doerr@sap.com> wrote:

> Hi,
>
> we have an addon to your G1 pre-barrier change which reduces
> its cost.
> It is possible to get rid of the do_load in inline_unsafe_load_store
> because we always know the value which gets overwritten.
>
> In case of cmpxchg, the only value which might get overwritten
> is known in advance.
> In case of xchg, the old value is returned by the load_store.
> It is allowed to do the barrier after the xchg because there's
> no safepoint between them.
>
> Please review
> http://cr.openjdk.java.net/~goetz/webrevs/opto_g1_barr/
>
> Best regards,
> Martin and Goetz

@@ -2753,14 +2753,19 @@
     // Execute transformation here to avoid barrier generation in such case.
     if (_gvn.type(newval) == TypePtr::NULL_PTR)
       newval = _gvn.makecon(TypePtr::NULL_PTR);
 
     // Reference stores need a store barrier.
-    pre_barrier(true /* do_load*/,
-                control(), base, adr, alias_idx, newval, value_type->make_oopptr(),
-                NULL /* pre_val*/,
+    if (kind == LS_cmpxchg) {
+      // The only known value which might get overwritten is oldval.
+      pre_barrier(false /* do_load */,
+                  control(), NULL, NULL, max_juint, NULL, NULL,
+                  oldval /* pre_val */,
                 T_OBJECT);
+    }
+    // LS_xchg: see below.
+
 #ifdef _LP64
     if (adr->bottom_type()->is_ptr_to_narrowoop()) {
       Node *newval_enc = _gvn.transform(new (C) EncodePNode(newval, newval->bottom_type()->make_narrowoop()));
       if (kind == LS_xchg) {
         load_store = _gvn.transform(new (C) GetAndSetNNode(control(), mem, adr,

@@ -2792,20 +2797,30 @@
   // main role is to prevent LoadStore nodes from being optimized away
   // when their results aren't used.
   Node* proj = _gvn.transform(new (C) SCMemProjNode(load_store));
   set_memory(proj, alias_idx);
 
-  // Add the trailing membar surrounding the access
-  insert_mem_bar(Op_MemBarCPUOrder);
-  insert_mem_bar(Op_MemBarAcquire);
-
 #ifdef _LP64
   if (type == T_OBJECT && adr->bottom_type()->is_ptr_to_narrowoop() && kind == LS_xchg) {
     load_store = _gvn.transform(new (C) DecodeNNode(load_store, load_store->get_ptr_type()));
   }
 #endif
 
+  // G1: Don't need to load pre_val. The old value is returned by load_store.
+  // The pre_barrier can execute after the xchg as long as no safepoint
+  // gets inserted between them.
+  if (type == T_OBJECT && kind == LS_xchg) {
+    pre_barrier(false /* do_load */,
+                control(), NULL, NULL, max_juint, NULL, NULL,
+                load_store /* pre_val */,
+                T_OBJECT);
+  }
+
+  // Add the trailing membar surrounding the access
+  insert_mem_bar(Op_MemBarCPUOrder);
+  insert_mem_bar(Op_MemBarAcquire);
+
   assert(type2size[load_store->bottom_type()->basic_type()] == type2size[rtype], "result type should match");
   set_result(load_store);
   return true;
 }
 

Comments
noreg-sqe: we have tons of tests for G1 already. And G1 performance is tracked by performance team.
18-10-2013

On 8/22/13 10:59 AM, Tony Printezis wrote: > Vladimir, > > As long as the pre-value is eventually stored on the SATB buffer when > the thread wakes up, moving it after the store is actually safe. Marking > will be finalized at a safepoint (remark) and in order for that thread > to reach the safepoint it will have to wake up and continue, which means > that it will enqueue the pre-value in the process. (The main reason the > pre-barrier is before the store is because it has to read the pre-value > before the actual store for the obvious reasons.) > > As you eluded to, this is different to the store / post-barrier > interaction which has specific ordering requirements (store first, > post-barrier afterwards). > > Regards, > > Tony > > On 8/22/13 10:33 AM, Vladimir Kozlov wrote: >> Martin, >> >> Thank you for patch. I think change for cmpxchg is fine but I am >> concern about swapping pre- and post- barriers for xchg. >> I filed rfe for that: >> >> 8023597: Optimize G1 barriers code for unsafe load_store >> >> Mikael, >> >> What about concurrent marking (not that familiar with GC :) )? >> The java thread which does this store could be suspended by OS before >> it executes pre-barrier. GC threads could see new store and dirty_card >> but there will be no prev_value available. >> >> Thanks, >> Vladimir >> >> On 8/22/13 7:31 AM, Mikael Gerdin wrote: >>> >>> >>> I agree that it should be safe to move the barrier as long as there is >>> no safepoint but I am not familiar enough with C2 to able to verify that >>> fact. >>> >>> But I'm not sure if the possible performance gains are worth the code >>> complexity and the confusion of having the "pre-barrier" after the >>> "post-barrier" :) >>> >>> /Mikael >>>
22-08-2013