As stated in JDK-8130918, G1's write post-barrier is laid out in full in the critical path. We profiled the branching frequency using a custom post-barrier with DaCapo benchmarks, results are in the attached spreadsheet. There are two configs:
"stress" - setting small heap size to stress GC;
"8GHeap" - using 8G heap, so there is little GC happening.
Key takeaways:
1. Most writes cross region boundary;
2. Most writes are of non-null values (not show in the profile);
3. Most writes happen to young objects.
We improvement the branching frequency and checked that the assembly code is laid out as expected. For this program:
public class Demo {
  Demo o;
  void setDemo(Demo o) {
    this.o = o;
  }
  public static void main(String[] args) {
    Demo o1 = new Demo();
    Demo o2 = new Demo();
    for (int i = 0; i < 50000; i++) {
      o1.setDemo(o2);
    }   
  }
}
The generated assembly with and without the patch is attached.
We have tested this patch in JDK11 with one of our important production workloads, and it gives 1% reduction in CPU-cost-per-query.