JDK-8189176 : AARCH64: Improve _updateBytesCRC32 intrinsic
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 10
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • CPU: aarch64
  • Submitted: 2017-10-11
  • Updated: 2018-05-25
  • Resolved: 2017-11-02
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 10
10 b33Fixed
Related Reports
Blocks :  
Relates :  
Relates :  
Description
In default case -XX:+UseCRC32 the intrinsic for java.util.zip.CRC32.updateBytes uses CRC32X instruction. Performance for this case can be improved with software pipelining. Module scheduling can be applied to main loop. Experiments show that pair load LDP instructions are to be splitted to LDRs.

Target buffer lenghts are N*64 in [128;4096] interval.
Comments
mvn instead of orn http://cr.openjdk.java.net/~dchuyko/8189176/webrev.02/
02-11-2017

Separate kernel_crc32_using_crc32() subroutine http://cr.openjdk.java.net/~dchuyko/8189176/webrev.01/
01-11-2017

http://cr.openjdk.java.net/~dchuyko/8189176/webrev.00/ On Cavium ThunderX with 'taskset -c 0-47' we observe 1.5x improvement in CRC32Bench.calcCRC32 for size=512 bytes: size, b: average time Baseline 64: 59 ��1 ns/op 512: 287 ��1 ns/op 4096: 2116 ��2 ns/op Patched 64: 54 ��1 ns/op 512: 189 ��1 ns/op 4096: 1313 ��2 ns/op There's also similar improvement on Raspberry Pi 3.
11-10-2017

Benchmarks used for measurements: http://cr.openjdk.java.net/~dchuyko/8189176/crc32/CRC32Bench.java http://cr.openjdk.java.net/~dchuyko/8189176/crc32/CRC32SlidingBench.java 1. Calculate CRC32 over same array. This is a primary one. CRC32Bench.calcCRC32() 2. Calculate CRC32 over copy of next chunk from large array. CRC32SlidingBench.calcNextFromBuffer() 3. Calculate CRC32 over next chunk in large array. CRC32SlidingBench.calcNextInArray()
11-10-2017