JDK-8214922 : AArch64: Add vectorization support for fmin/fmax
Type:Enhancement
Component:hotspot
Sub-Component:compiler
Affected Version:11,12
Priority:P4
Status:Resolved
Resolution:Fixed
OS:linux
CPU:aarch64
Submitted:2018-12-06
Updated:2022-02-07
Resolved:2019-03-12
The Version table provides details related to the release that this issue/RFE will be addressed.
Unresolved : Release in which this issue/RFE will be addressed. Resolved: Release in which this issue/RFE has been resolved. Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.
Tests on AArch64 platform show that floating-point Math.min/max() intrinsics do not improve the performance too much. But it's very helpful to add auto-vectorization support for the fmin/fmax operations.
Comments
Fix Request (11u):
Backporting this patch further improves min/max performance on aarch64 after JDK-8212043 backport. Original patch does not applies cleanly and requires minor adjustments.
Testing: tier1, tier2, TestFpMinMaxIntrinsics; hot methods assembly for TestSIMDFpBHMinMax[2] benchmarks produce expected sequences like:
fminv s18, v19.4s
fmin s18, s18, s16
...
The change is purely additive and influences particular methods, it was integrated in JDK 13. So the risk is low.
Performance results on Graviton2 were measured for a modified version of original TestSIMDFpMinMax[1] benchmarks -- TestSIMDFpBHMinMax[2]. Modification passes min/max calculation to Blackhole overwise is can be entirely optimized out (see below).
Benchmark Mode Cnt Score Error Units
TestSIMDFpBHMinMax.testVectFindMaxSumDouble avgt 6 2266.598 ± 0.736 us/op
TestSIMDFpBHMinMax.testVectFindMaxSumFloat avgt 6 2266.362 ± 0.021 us/op
TestSIMDFpBHMinMax.testVectFindMinSumDouble avgt 6 2266.377 ± 0.018 us/op
TestSIMDFpBHMinMax.testVectFindMinSumFloat avgt 6 2266.476 ± 0.439 us/op
TestSIMDFpBHMinMax.testVectMaxDouble avgt 6 2472.559 ± 0.680 us/op
TestSIMDFpBHMinMax.testVectMaxFloat avgt 6 2476.601 ± 0.019 us/op
TestSIMDFpBHMinMax.testVectMinDouble avgt 6 2472.242 ± 0.707 us/op
TestSIMDFpBHMinMax.testVectMinFloat avgt 6 2476.607 ± 0.016 us/op
--> intrinsics
TestSIMDFpBHMinMax.testVectFindMaxSumDouble avgt 6 1645.077 ± 0.016 us/op
TestSIMDFpBHMinMax.testVectFindMaxSumFloat avgt 6 1645.126 ± 0.042 us/op
TestSIMDFpBHMinMax.testVectFindMinSumDouble avgt 6 1645.080 ± 0.021 us/op
TestSIMDFpBHMinMax.testVectFindMinSumFloat avgt 6 1645.123 ± 0.048 us/op
TestSIMDFpBHMinMax.testVectMaxDouble avgt 6 1236.971 ± 0.217 us/op
TestSIMDFpBHMinMax.testVectMaxFloat avgt 6 1237.316 ± 0.023 us/op
TestSIMDFpBHMinMax.testVectMinDouble avgt 6 1237.115 ± 0.145 us/op
TestSIMDFpBHMinMax.testVectMinFloat avgt 6 1237.313 ± 0.016 us/op
--> vectorizarion
TestSIMDFpBHMinMax.testVectFindMaxSumDouble avgt 6 1645.698 ± 1.886 us/op
TestSIMDFpBHMinMax.testVectFindMaxSumFloat avgt 6 493.632 ± 0.263 us/op
TestSIMDFpBHMinMax.testVectFindMinSumDouble avgt 6 1646.307 ± 0.040 us/op
TestSIMDFpBHMinMax.testVectFindMinSumFloat avgt 6 497.330 ± 11.436 us/op
TestSIMDFpBHMinMax.testVectMaxDouble avgt 6 671.872 ± 0.077 us/op
TestSIMDFpBHMinMax.testVectMaxFloat avgt 6 340.141 ± 0.251 us/op
TestSIMDFpBHMinMax.testVectMinDouble avgt 6 671.884 ± 0.098 us/op
TestSIMDFpBHMinMax.testVectMinFloat avgt 6 340.137 ± 0.267 us/op
The original bench[1] and the assembly show that this optimization allows to throw out unused min/max calculation from loops:
TestSIMDFpMinMax.testVectFindMaxSumDouble avgt 6 1636.050 ± 0.025 us/op
TestSIMDFpMinMax.testVectFindMaxSumFloat avgt 6 30.307 ± 0.117 us/op # here
TestSIMDFpMinMax.testVectFindMinSumDouble avgt 6 1636.053 ± 0.016 us/op
TestSIMDFpMinMax.testVectFindMinSumFloat avgt 6 30.299 ± 0.100 us/op # here
TestSIMDFpMinMax.testVectMaxDouble avgt 6 673.248 ± 0.637 us/op
TestSIMDFpMinMax.testVectMaxFloat avgt 6 339.733 ± 1.168 us/op
TestSIMDFpMinMax.testVectMinDouble avgt 6 673.027 ± 0.065 us/op
TestSIMDFpMinMax.testVectMinFloat avgt 6 340.072 ± 0.660 us/op
11u RFR: https://mail.openjdk.java.net/pipermail/jdk-updates-dev/2021-February/005037.html
https://mail.openjdk.java.net/pipermail/jdk-updates-dev/2021-March/005303.html
[1] https://cr.openjdk.java.net/~pli/rfr/8214922/TestSIMDFpMinMax.java
[2] http://cr.openjdk.java.net/~dchuyko/8214922/TestSIMDFpBHMinMax.java
Patch http://cr.openjdk.java.net/~pli/rfr/8214922/webrev.02/ reviewed but not submitted as this optimization does not bring benefit for simple reduction min/max. We will re-consider submit this patch after JDK-8188313 resolved.