JDK-8221092 : UseAVX=3 has performance degredation on Skylake (X7) processors
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 11.0.2,12,13
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2019-03-19
  • Updated: 2022-05-16
  • Resolved: 2019-10-04
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 13 JDK 14
11.0.6-oracleFixed 13.0.3Fixed 14 b18Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Comments
13u Fix Request Backporting this patch fixes a performance regression. Patch applies cleanly
25-03-2020

11u Fix Request Backporting this patch fixes a performance regression. Patch does not apply cleanly to 11u and requires adjustments. RFR https://mail.openjdk.java.net/pipermail/jdk-updates-dev/2019-November/002070.html
05-11-2019

URL: https://hg.openjdk.java.net/jdk/jdk/rev/c6f1226cfb72 User: vdeshpande Date: 2019-10-04 18:45:51 +0000
04-10-2019

Updated webrev: http://cr.openjdk.java.net/~vdeshpande/8221092/webrev.01/
01-10-2019

With this patch in http://cr.openjdk.java.net/~vdeshpande/8221092/webrev.00/ we fix the regression with UseAVX=3 on Skylake and Skylake onwards processors. We set UseAVX=2 for Skylake based CPUs. For Skylake onwards, the AVX3Threshold on the size of the array in bytes is used to execute the AVX512 code. The default is set to 4096 bytes. We found out this value for threshold experimentally as AVX512 becomes beneficial for bigger arrays. We have added AVX3Threshold == 0 based code generation for intrinsics of comparison type as the user could use the AVX512 based comparison if the user knows that AVX3 would give a better performance with their use case. Otherwise the intrinsics of comparison type would use AVX2 based code only.
26-09-2019

On the test case we developed for the bug, here are the results on our Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz: JDK 8: 58517.5 +/- 1309 ops/s JDK 11.0.4: 42811.9 +/- 981 ops/s JDK 11.0.4 with UseAVX=2: 56488.6 +/- 3562 ops/s JDK 11.0.6 (Vladimir's build): 57752.5 +/- 2028 ops/s JDK 11.0.6 with UseAVX=3: 43028.2 +/- 1239 ops/s So that looks great. Our app level tests that we've been able to run so far all reproduce the JDK 8 (or JDK 11 with UseAVX=2) performance as well.
26-09-2019

My tier1-tier3 testing of webrev.00 passed. Note, not all but many tests were run on avx512 machines. Now we only need confirmation that regression is resolved .
26-09-2019

After testing in latest JDK we should consider to backport it into JDK 11u.
25-09-2019

[~soaks] Can you test proposed patch?
25-09-2019

webrev for the patch: http://cr.openjdk.java.net/~vdeshpande/8221092/webrev.00/
03-09-2019

JDK-8183103 disabled PostLoopMultiversioning and UseAVX=3 by default. UseAVX=3 as default was restored with JDK-8184036. Therefore, all builds since JDK 9 default to PostLoopMultiversioning=false and UseAVX=3 (if the machine supports it). The only known AVX=3 bug I'm aware of is JDK-8192070 but I don't think it's related.
01-04-2019

I"m not sure what the comment in JDK-8183103 means; on our systems JDK 11 and higher builds do default to UseAVX=3 (we didn't test JDK9 or 10 so I'm not sure when that might have changed). The regression exists in latest build I have (JDK 13 ea 12), and also exists when PostLoopMultiversioning is enabled.
20-03-2019