JDK-8318562 : Computational test more than 2x slower when AVX instructions are used
Type:Enhancement
Component:hotspot
Sub-Component:compiler
Affected Version:17,21,22
Priority:P4
Status:Closed
Resolution:Fixed
CPU:x86
Submitted:2023-10-19
Updated:2024-01-08
Resolved:2024-01-05
The Version table provides details related to the release that this issue/RFE will be addressed.
Unresolved : Release in which this issue/RFE will be addressed. Resolved: Release in which this issue/RFE has been resolved. Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.
Attached JMH microbenchmark is slower when AVX instructions are used vs -XX:UseAVX=0
Comments
This broke implicit null checking, see JDK-8322985.
05-01-2024
[jdk17u-fix-request] Approval Request from sviswa7
A small backport PR which fixes the performance regression with vcvt* instructions on AVX platforms.,
Minor changes were required to resolve a conflict in macroAssembler_x86.cpp/hpp.,
The conflict was due to a change in locked_cmpxchgptr signature on mainline.,
Please approve.
05-12-2023
A pull request was submitted for review.
URL: https://git.openjdk.org/jdk17u-dev/pull/2016
Date: 2023-12-05 18:15:21 +0000
05-12-2023
A pull request was submitted for review.
URL: https://git.openjdk.org/jdk21u/pull/381
Date: 2023-11-17 22:08:24 +0000
17-11-2023
Fix Request JDK21u:
This is a clean backport and not dependent on any other backport.
A very small patch which fixes the performance regression with vcvt* instructions on AVX platforms.
Please approve.
A pull request was submitted for review.
URL: https://git.openjdk.org/jdk/pull/16701
Date: 2023-11-16 23:46:53 +0000
17-11-2023
Timing with following standalone micro.
public class test_perf {
public static double compute_pi() {
double pi = 4.0;
boolean sign = false;
for (int i = 3; i < 1000; i += 2) {
if (sign) {
pi += 4.0 / i;
} else {
pi -= 4.0 / i;
}
sign = !sign;
}
return pi;
}
public static void main(String [] args) {
double res = 0.0;
for (int i = 0; i < 10000000; i++) {
res += compute_pi();
}
long t1 = System.currentTimeMillis();
for (int i = 0; i < 10000000; i++) {
res += compute_pi();
}
long t2 = System.currentTimeMillis();
System.out.println("[time] " + (t2-t1) + " ms [res] " + res);
}
}
AVX1 [time] 267 ms [res] 627918.5311171506
AVX0 [time] 66 ms [res] 627918.5311171506