The Version table provides details related to the release that this issue/RFE will be addressed.
Unresolved : Release in which this issue/RFE will be addressed. Resolved: Release in which this issue/RFE has been resolved. Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.
There is no hasNegatives intrinsic implemeneted for AArch64. It should be implemented.
Comments
Attached latest(and fastest) version of intrinsic (hasNegatives_merged_v1.1.diff) along with all results in a table(intrinsics-data.xls). Column for attached intrinsic version results is "* merged new".
27-07-2017
Another alternative implementation which uses aligned access where possible is attached. However, it's significantly more code, which will have benefit only in case hasNegatives was called with non-zero offset initially(or offset is not multiple of 8)
20-07-2017
Attached intrinsic diff.
Implemetation notes:
1) check array length variable to have lower bits set (0x1, 0x2, 0x4, 0x8) and invoke respective load instruction(ldrb, ldrh, ldrw, ldr) while reducing remaining length variable respectively. So, remaining length is 16*N after this code. Proceed to 2).
2) in case remaining length >= 64, loads data in a loop with 4 ldp instructions(16 bytes each) and invoking prfm (prefetch hint) in case SoftwarePrefetchHintDistance >= 0 once per loop. This new flag (SoftwarePrefetchHintDistance) is introduced to provide configurable software prefetching in dynamically compiled code. This flag can disable software prefetch hint or set prefetch distance. Default distance is set to 3 * dcache_line which shows best performance on armv8 CPUs we have. 64-bytes loop proceed until length < 64, then, proceed to 3).
3) simple 16-byte loading loop until remaining length is 0.
Note: It was observed that software prefetching hint improves performance for platforms that do not have hardware prefetching (ThunderX T88), but also for platforms we have in hand which do have hardware prefetching (Cortex A53).