JDK-8225776 : Optimize branch frequency of G1's write post-barrier in C2
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 9,10,11,12,13
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2019-06-14
  • Updated: 2019-10-22
  • Resolved: 2019-08-05
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 14
14 b09Fixed
Related Reports
Relates :  
Relates :  
Description
As stated in JDK-8130918, G1's write post-barrier is laid out in full in the critical path. We profiled the branching frequency using a custom post-barrier with DaCapo benchmarks, results are in the attached spreadsheet. There are two configs:
"stress" - setting small heap size to stress GC;
"8GHeap" - using 8G heap, so there is little GC happening.
Key takeaways:
1. Most writes cross region boundary;
2. Most writes are of non-null values (not show in the profile);
3. Most writes happen to young objects.

We improvement the branching frequency and checked that the assembly code is laid out as expected. For this program:
public class Demo {
  Demo o;
  void setDemo(Demo o) {
    this.o = o;
  }
  public static void main(String[] args) {
    Demo o1 = new Demo();
    Demo o2 = new Demo();
    for (int i = 0; i < 50000; i++) {
      o1.setDemo(o2);
    }   
  }
}
The generated assembly with and without the patch is attached.

We have tested this patch in JDK11 with one of our important production workloads, and it gives 1% reduction in CPU-cost-per-query.
Comments
URL: https://hg.openjdk.java.net/jdk/jdk/rev/b5ab9a71aa95 User: manc Date: 2019-08-05 20:20:46 +0000
05-08-2019

More details on the branch profiling result: The branch profiling was done with this patch in JDK11 (I have rebased it to tip): http://cr.openjdk.java.net/~manc/8225776/branch_profiling/ It adds hsperfdata counters for the branching statistics. To view their values, pass "-XX:+UnlockDiagnosticVMOptions -XX:+G1WriteBarrierStats -XX:PerfDataSaveFile=result.hsperf", then run the following afterwards: $ jstat -J-Djstat.showUnsupported=true -snap file:///$(pwd)/result.hsperf The DaCapo benchmarks were running with -XX:-TieredCompilation, and the values are average of 5 trials. Each trial has 9 warmup iterations followed by the actual experiment iteration. The reported values of the hsperfdata counters are only for the last experiment iteration. We took two snapshots of the hsperfdata counters before and after the last experiment iteration, and reported the diff of the two snapshots.
11-07-2019