Bug ID: JDK-8214922 AArch64: Add vectorization support for fmin/fmax

Type: Enhancement
Component: hotspot
Sub-Component: compiler
Affected Version: 11,12

Priority: P4
Status: Resolved
Resolution: Fixed
OS: linux
CPU: aarch64

Submitted: 2018-12-06
Updated: 2022-02-07
Resolved: 2019-03-12

JDK 11	JDK 13
11.0.12Fixed	13 b12Fixed

Tests on AArch64 platform show that floating-point Math.min/max() intrinsics do not improve the performance too much. But it's very helpful to add auto-vectorization support for the fmin/fmax operations.

Fix Request (11u): Backporting this patch further improves min/max performance on aarch64 after JDK-8212043 backport. Original patch does not applies cleanly and requires minor adjustments. Testing: tier1, tier2, TestFpMinMaxIntrinsics; hot methods assembly for TestSIMDFpBHMinMax[2] benchmarks produce expected sequences like: fminv s18, v19.4s fmin s18, s18, s16 ... The change is purely additive and influences particular methods, it was integrated in JDK 13. So the risk is low. Performance results on Graviton2 were measured for a modified version of original TestSIMDFpMinMax[1] benchmarks -- TestSIMDFpBHMinMax[2]. Modification passes min/max calculation to Blackhole overwise is can be entirely optimized out (see below). Benchmark Mode Cnt Score Error Units TestSIMDFpBHMinMax.testVectFindMaxSumDouble avgt 6 2266.598 ± 0.736 us/op TestSIMDFpBHMinMax.testVectFindMaxSumFloat avgt 6 2266.362 ± 0.021 us/op TestSIMDFpBHMinMax.testVectFindMinSumDouble avgt 6 2266.377 ± 0.018 us/op TestSIMDFpBHMinMax.testVectFindMinSumFloat avgt 6 2266.476 ± 0.439 us/op TestSIMDFpBHMinMax.testVectMaxDouble avgt 6 2472.559 ± 0.680 us/op TestSIMDFpBHMinMax.testVectMaxFloat avgt 6 2476.601 ± 0.019 us/op TestSIMDFpBHMinMax.testVectMinDouble avgt 6 2472.242 ± 0.707 us/op TestSIMDFpBHMinMax.testVectMinFloat avgt 6 2476.607 ± 0.016 us/op --> intrinsics TestSIMDFpBHMinMax.testVectFindMaxSumDouble avgt 6 1645.077 ± 0.016 us/op TestSIMDFpBHMinMax.testVectFindMaxSumFloat avgt 6 1645.126 ± 0.042 us/op TestSIMDFpBHMinMax.testVectFindMinSumDouble avgt 6 1645.080 ± 0.021 us/op TestSIMDFpBHMinMax.testVectFindMinSumFloat avgt 6 1645.123 ± 0.048 us/op TestSIMDFpBHMinMax.testVectMaxDouble avgt 6 1236.971 ± 0.217 us/op TestSIMDFpBHMinMax.testVectMaxFloat avgt 6 1237.316 ± 0.023 us/op TestSIMDFpBHMinMax.testVectMinDouble avgt 6 1237.115 ± 0.145 us/op TestSIMDFpBHMinMax.testVectMinFloat avgt 6 1237.313 ± 0.016 us/op --> vectorizarion TestSIMDFpBHMinMax.testVectFindMaxSumDouble avgt 6 1645.698 ± 1.886 us/op TestSIMDFpBHMinMax.testVectFindMaxSumFloat avgt 6 493.632 ± 0.263 us/op TestSIMDFpBHMinMax.testVectFindMinSumDouble avgt 6 1646.307 ± 0.040 us/op TestSIMDFpBHMinMax.testVectFindMinSumFloat avgt 6 497.330 ± 11.436 us/op TestSIMDFpBHMinMax.testVectMaxDouble avgt 6 671.872 ± 0.077 us/op TestSIMDFpBHMinMax.testVectMaxFloat avgt 6 340.141 ± 0.251 us/op TestSIMDFpBHMinMax.testVectMinDouble avgt 6 671.884 ± 0.098 us/op TestSIMDFpBHMinMax.testVectMinFloat avgt 6 340.137 ± 0.267 us/op The original bench[1] and the assembly show that this optimization allows to throw out unused min/max calculation from loops: TestSIMDFpMinMax.testVectFindMaxSumDouble avgt 6 1636.050 ± 0.025 us/op TestSIMDFpMinMax.testVectFindMaxSumFloat avgt 6 30.307 ± 0.117 us/op # here TestSIMDFpMinMax.testVectFindMinSumDouble avgt 6 1636.053 ± 0.016 us/op TestSIMDFpMinMax.testVectFindMinSumFloat avgt 6 30.299 ± 0.100 us/op # here TestSIMDFpMinMax.testVectMaxDouble avgt 6 673.248 ± 0.637 us/op TestSIMDFpMinMax.testVectMaxFloat avgt 6 339.733 ± 1.168 us/op TestSIMDFpMinMax.testVectMinDouble avgt 6 673.027 ± 0.065 us/op TestSIMDFpMinMax.testVectMinFloat avgt 6 340.072 ± 0.660 us/op 11u RFR: https://mail.openjdk.java.net/pipermail/jdk-updates-dev/2021-February/005037.html https://mail.openjdk.java.net/pipermail/jdk-updates-dev/2021-March/005303.html [1] https://cr.openjdk.java.net/~pli/rfr/8214922/TestSIMDFpMinMax.java [2] http://cr.openjdk.java.net/~dchuyko/8214922/TestSIMDFpBHMinMax.java

12-03-2021

URL: http://hg.openjdk.java.net/jdk/jdk/rev/1dbe0c210134 User: njian Date: 2019-03-12 04:25:10 +0000

12-03-2019

Patch http://cr.openjdk.java.net/~pli/rfr/8214922/webrev.02/ reviewed but not submitted as this optimization does not bring benefit for simple reduction min/max. We will re-consider submit this patch after JDK-8188313 resolved.

21-02-2019

Blocks :	JDK-8188313 - C2: Consider enabling auto-vectorization for simple reductions
Relates :	JDK-8212043 - AArch64: Add floating-point Math.min/max intrinsics