Consider this benchmark:
https://github.com/openjdk/jdk/compare/master...mcimadamore:jdk:xor_bench?expand=1
Here, we compare the performance of code that copies arrays into off-heap storage ahead of a native call. It turns out that doing the copy using Unsafe::arrayCopy is 15-20% slower than using JNI's GetByteArrayRegion function.
Profiling the benchmark with perfasm reveals that Unsafe::arrayCopy boils down to:
StubRoutines::jlong_disjoint_arraycopy
Whereas for JNI we end up with this:
__memmove_avx_unaligned_erms
The latter, judging from the name, likely enjoys AVX optimizations, which seems the most likely explanation for the difference in the performance profile of the two code paths.