Copy has hand rolled code for lots of platforms. This makes sense when atomicity is required on elements that are not bytes. What doesn't make a whole lot of sense today, maybe it did a decade ago, is the code that uses custom loops for bytes or non-atomic operations instead of calling memcpy, memset, or memmove. Today most compillers treat those as intrinsics, replacing them with optimized code for small sizes when known at compile time otherwise calling libc. Most libc also contain heavily optimized implementations which include specializations for different architecture extensions.
All implementations of Copy that operate on bytes or do not have atomic requirements should probably be updated to use memcpy, memset, or memmove as appropriate.