Currently, when doing oop arraycopy from C2 and C1, we call into runtime for pre-barrier, then do the arraycopy as fast-blt-loop, then call into runtime again for post-barrier.
In Shenandoah we would like to call into runtime once, and do the arraycopy in a single-loop, when GC is active, and only do the fast-blt-loop outside of GC.
This requires some GC interface changes to allow skipping the fast-loop when it's not needed, and to also pass the element type for doing the arraycopy loop and checkcasts in runtime.