Inflate, Compress, hasNegatives AVX512 enabled intrinsics. Using mask registers for a tail of the loops. 32 element warp size for inflate, compress. 64 warp for hasNegatives (regardless of bitnes of the architecture). Tight loop followed by a tail (and sometimes partial candidates snippet).
Code Contributed by: Tomasz Wojtowicz (tomasz.wojtowicz@intel.com)
---------
Testing:
---------
Internally developed ubenchmarks for performance & similar functional using Reflections API for those 3 intrinsics on an up to 4K chunks of pseudo random initialized data to verify correctness.
Has_negatives had a negative sentinel as a last element in an array.
Also tailored examples for compress which would test all of the possible entry scenarios (tail present/not present, non-compressable element found in head OR tail - to make sure that there is a coverage for all of the basic blocks inside intrinsic)
Everything run at least 3 times for -XX:UseAVX=1 then 2 and 3, depending on a scenario for establishing performance baseline AND/OR correctness when "older"(AVX<=2) had been modified.