The NULL check and region check in the post barrier can be combined to a single check using ANDN.
So provided a new value q stored to field q, instead of:
if (q != NULL) {
if (((q ^ p) >> RegionLog) != 0) {
...
}
}
the following is equivalent:
if (((q &~ p) >> RegionLog) != 0) {
...
}
Where the &~ may use a single ANDN instruction. On x86_64, with use of BMI instructions which introduces a three operand ANDN, the two checks become 3 instructions including the branch:
andn tmp, q, p;
sar tmp, RegionLog;
jne slower_path
(BMI Instructions are available since Haswell/Bulldozer, i.e. 2013+, so there should be a fair amount of processors supporting it already)
Otherwise something like
mov tmp, p
not tmp
and tmp, q
sar tmp, RegionLog
jne slower_path
(or whatever the compiler would generate) would do the trick.
This not only reduces code, but replaces two typically taken branches to one even more often taken branch.
(Observation from [~eosterlund])