JDK-8221092 : UseAVX=3 has performance degredation on Skylake (X7) processors
Type:Bug
Component:hotspot
Sub-Component:compiler
Affected Version:11.0.2,12,13
Priority:P3
Status:Resolved
Resolution:Fixed
Submitted:2019-03-19
Updated:2022-05-16
Resolved:2019-10-04
The Version table provides details related to the release that this issue/RFE will be addressed.
Unresolved : Release in which this issue/RFE will be addressed. Resolved: Release in which this issue/RFE has been resolved. Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.
13u Fix Request
Backporting this patch fixes a performance regression. Patch applies cleanly
25-03-2020
11u Fix Request
Backporting this patch fixes a performance regression. Patch does not apply cleanly to 11u
and requires adjustments. RFR https://mail.openjdk.java.net/pipermail/jdk-updates-dev/2019-November/002070.html
With this patch in http://cr.openjdk.java.net/~vdeshpande/8221092/webrev.00/ we fix the regression with UseAVX=3 on Skylake and Skylake onwards processors.
We set UseAVX=2 for Skylake based CPUs. For Skylake onwards, the AVX3Threshold on the size of the array in bytes is used to execute the AVX512 code.
The default is set to 4096 bytes. We found out this value for threshold experimentally as AVX512 becomes beneficial for bigger arrays.
We have added AVX3Threshold == 0 based code generation for intrinsics of comparison type as the user could use the AVX512 based comparison if the user knows
that AVX3 would give a better performance with their use case. Otherwise the intrinsics of comparison type would use AVX2 based code only.
26-09-2019
On the test case we developed for the bug, here are the results on our Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz:
JDK 8: 58517.5 +/- 1309 ops/s
JDK 11.0.4: 42811.9 +/- 981 ops/s
JDK 11.0.4 with UseAVX=2: 56488.6 +/- 3562 ops/s
JDK 11.0.6 (Vladimir's build): 57752.5 +/- 2028 ops/s
JDK 11.0.6 with UseAVX=3: 43028.2 +/- 1239 ops/s
So that looks great. Our app level tests that we've been able to run so far all reproduce the JDK 8 (or JDK 11 with UseAVX=2) performance as well.
26-09-2019
My tier1-tier3 testing of webrev.00 passed. Note, not all but many tests were run on avx512 machines.
Now we only need confirmation that regression is resolved .
26-09-2019
After testing in latest JDK we should consider to backport it into JDK 11u.
25-09-2019
[~soaks] Can you test proposed patch?
25-09-2019
webrev for the patch: http://cr.openjdk.java.net/~vdeshpande/8221092/webrev.00/
03-09-2019
JDK-8183103 disabled PostLoopMultiversioning and UseAVX=3 by default. UseAVX=3 as default was restored with JDK-8184036.
Therefore, all builds since JDK 9 default to PostLoopMultiversioning=false and UseAVX=3 (if the machine supports it).
The only known AVX=3 bug I'm aware of is JDK-8192070 but I don't think it's related.
01-04-2019
I"m not sure what the comment in JDK-8183103 means; on our systems JDK 11 and higher builds do default to UseAVX=3 (we didn't test JDK9 or 10 so I'm not sure when that might have changed).
The regression exists in latest build I have (JDK 13 ea 12), and also exists when PostLoopMultiversioning is enabled.