As stated in JDK-8130918, G1's write post-barrier is laid out in full in the critical path. We profiled the branching frequency using a custom post-barrier with DaCapo benchmarks, results are in the attached spreadsheet. There are two configs:
"stress" - setting small heap size to stress GC;
"8GHeap" - using 8G heap, so there is little GC happening.
Key takeaways:
1. Most writes cross region boundary;
2. Most writes are of non-null values (not show in the profile);
3. Most writes happen to young objects.
We improvement the branching frequency and checked that the assembly code is laid out as expected. For this program:
public class Demo {
Demo o;
void setDemo(Demo o) {
this.o = o;
}
public static void main(String[] args) {
Demo o1 = new Demo();
Demo o2 = new Demo();
for (int i = 0; i < 50000; i++) {
o1.setDemo(o2);
}
}
}
The generated assembly with and without the patch is attached.
We have tested this patch in JDK11 with one of our important production workloads, and it gives 1% reduction in CPU-cost-per-query.