Profiles from refworkload/wls_webapp_atomic indicate that the checkcast_arraycopy stub for G1 is very slow compared to parallel gc which spends almost no time in there. In detail from a vtune trace:
3.698 weblogic.servlet.internal.ServletRequestImpl.getParameter(java.lang.String)
->1.755 checkcast_arraycopy
->1.722 BarrierSet::static_write_ref_array_post(HeapWord*, unsigned long)
->0.652 BarrierSet::static_write_ref_array_pre(HeapWord*, unsigned long)
-> 0.202 G1SATBCardTableModRefBS::write_ref_array_pre(unsigned*, int, bool)
I.e. G1 calls helper methods, while in parallel gc these are inlined.
One idea to mitigate this is to move some checks inline into the checkcast_arraycopy, e.g. for the pre barrier the marking-active check, and for the post barrier a young gen check.