oop_store has two overloads, differing in the volatility of *p. That overloading, and the use of casts to select which to invoke, is confusing. There really should be two functions, oop_store and release_oop_store (or some such construction).
It is also a mistake that the volatility of *p has any effect on the post-barrier. As things stand, we're inserting unnecessary memory barriers for CMS and different unnecessary barriers for other (non-G1) collectors.
With the present code, when using either SerialGC or ParallelGC, VM stores to a Java volatile have an unnecessary release memory barrier in the post-write barrier. (This is probably quite rare, so doesn't really matter for performance.)
With the present code, when using CMS, ordinary (not Java volatile) stores outside a pause have an unnecessary release memory barrier for the value store. Also, volatile stores in a pause have an unnecessary release barrier in the post-write barrier. (The latter is likely quite rare and unimportant. It might even be that this case never actually arises in VM code.)
These unnecessary memory barriers can easily go unnoticed, since release barriers don't require any additional instructions for TSO; there will only be barrier instructions for ARM/PPC/&etc, not for x86 or SPARC. The unnecessary release for the value store when using CMS is particularly problematic.
The only difference between [release_]oop_store should be the corresponding use of oopDesc::[release_]encode_store_heap_oop. The only remaining caller of the release version would berelease_obj_field_put. oop_store should *not* be conditionally calling the release version based on always_do_update_barrier.
The release parameter for update_barrier_set and BarrierSet::write_ref_field should be removed. Instead, CardTableModRefBS::inline_write_ref_field should conditionalize the release barrier for the card table update on always_do_update_barrier, or CMS should have its own barrier set whose write_ref_field does that, and CMS should not be using inline_write_ref_field.
CMS is presently the only collector that does anything interesting with always_do_update_barrier. G1 has commented out code to change it between true and false appropriately. I think that commented out code could be reinstated and the G1 write_ref_field's memory barrier could be similarly conditionalized (with greater benefit, since that one is a storeload). Alternatively, always_do_update_barrier could be made CMS-specific, depending on what happens with inline_write_ref_field and the release parameter for update_barrier_set is handled, as discussed above.
I don't know how much these unnecessary memory barriers matter, since this is all in VM code rather than compiled Java code. (Even on TSO machines they might inhibit some compiler optimizations.)
But besides any performance benefits to such a change, the existing code is just plain confusing IMO, because it is conflating the volatility of *p with the need for additional memory barriers in the post-write barrier.
klass_oop_store may have some similar issues. In particular, it has two overloads, based on the volatility of *p. However, there is only one caller, which casts that argument to volatile.