Relates :
|
Optimise the array copy code for aarch64. This improves the performance of copying small (0 to 80 bytes) arrays. The copy code is inlined (rather than calling out to copy_longs). The copy forwards and copy backwards case is identical because the small copy code reads all data into registers before writing any. Thankfully aarch64 has plenty of registers. The rationale for choosing 80 as the limit is that it provides a guarantee than copy_longs is always called with at least 64 bytes, even after worst case alignment fixup. This means the small case code in copy_longs can be deleted (I have put an assert in copy longs to check it is never called with < 64 bytes).