If you look closely at the disassembly for G1 post-barrier in JDK-8130918, then an interesting thing pops out on a hot path in the barrier:
http://cr.openjdk.java.net/~shade/8130918/g1.perfasm
; load receiver address, unpack to %r10
lea (%r12,%r13,8),%r10
; take the value address and unpack it to %r11
mov %rbx,%r11
shl $0x3,%r11
; xor them
xor %r10,%r11
; shift them by region size
shr $0x15,%r11
; test for 0, if so, this is an intra-region store, bail
test %r11,%r11
je 0x00007f82d0af4460
But, we might as well operate on the raw pointers, if we know the compressed oops mode we are operating at.
In this case, with zero-based compressed oops with 3-bit shift, we might just emit:
; receiver compressed oop is already at %r13
; value compressed oop is already at %rbx
; xor them
xor %r13,%rbx
; shift them by region size, plus compressed oops adjustment
shr $0x12,%r11
; test for 0, if so, this is an intra-region store, bail
test %r11,%r11
je 0x00007f82d0af4460
Since this code executes for each reference store, the performance improvement may be significant.