JDK-8318562 : Computational test more than 2x slower when AVX instructions are used
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 17,21,22
  • Priority: P4
  • Status: Closed
  • Resolution: Fixed
  • CPU: x86
  • Submitted: 2023-10-19
  • Updated: 2024-01-08
  • Resolved: 2024-01-05
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17 JDK 21 JDK 22
17.0.11Fixed 21.0.2Fixed 22 b25Fixed
Related Reports
Cloners :  
Relates :  
Relates :  
Description
Attached JMH microbenchmark is slower when AVX instructions are used vs -XX:UseAVX=0

Comments
This broke implicit null checking, see JDK-8322985.
05-01-2024

[jdk17u-fix-request] Approval Request from sviswa7 A small backport PR which fixes the performance regression with vcvt* instructions on AVX platforms., Minor changes were required to resolve a conflict in macroAssembler_x86.cpp/hpp., The conflict was due to a change in locked_cmpxchgptr signature on mainline., Please approve.
05-12-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk17u-dev/pull/2016 Date: 2023-12-05 18:15:21 +0000
05-12-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk21u/pull/381 Date: 2023-11-17 22:08:24 +0000
17-11-2023

Fix Request JDK21u: This is a clean backport and not dependent on any other backport. A very small patch which fixes the performance regression with vcvt* instructions on AVX platforms. Please approve.
17-11-2023

Changeset: 0881f2b0 Author: Sandhya Viswanathan <sviswanathan@openjdk.org> Date: 2023-11-17 20:10:17 +0000 URL: https://git.openjdk.org/jdk/commit/0881f2b0c43870ed10b1166d04cef9832e58629e
17-11-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/16701 Date: 2023-11-16 23:46:53 +0000
17-11-2023

Timing with following standalone micro. public class test_perf { public static double compute_pi() { double pi = 4.0; boolean sign = false; for (int i = 3; i < 1000; i += 2) { if (sign) { pi += 4.0 / i; } else { pi -= 4.0 / i; } sign = !sign; } return pi; } public static void main(String [] args) { double res = 0.0; for (int i = 0; i < 10000000; i++) { res += compute_pi(); } long t1 = System.currentTimeMillis(); for (int i = 0; i < 10000000; i++) { res += compute_pi(); } long t2 = System.currentTimeMillis(); System.out.println("[time] " + (t2-t1) + " ms [res] " + res); } } AVX1 [time] 267 ms [res] 627918.5311171506 AVX0 [time] 66 ms [res] 627918.5311171506
27-10-2023

Hot loop JIT sequence with additional command line options:- -XX:-UseOnStackReplacement -XX:-TieredCompilation -XX:UseAVX=0 -XX:CompileCommand=Print,test_perf::compute_pi -XX:-Inline -XX:LoopUnrollLimit=0 AVX0 ===== # {method} {0x00007f0a2f4003d8} 'compute_pi' '()D' in 'test_perf' # [sp+0x20] (sp of caller) 0x00007f0ac0bd7a40: sub $0x18,%rsp ; {no_reloc} 0x00007f0ac0bd7a47: mov %rbp,0x10(%rsp) 0x00007f0ac0bd7a4c: cmpl $0x1,0x20(%r15) 0x00007f0ac0bd7a54: jne 0x00007f0ac0bd7ae2 0x00007f0ac0bd7a5a: mov $0x3,%r9d 0x00007f0ac0bd7a60: movsd -0x48(%rip),%xmm1 # 0x00007f0ac0bd7a20 ; {section_word} 0x00007f0ac0bd7a68: movsd -0x48(%rip),%xmm2 # 0x00007f0ac0bd7a28 ; {section_word} 0x00007f0ac0bd7a70: xor %r11d,%r11d 0x00007f0ac0bd7a73: movapd %xmm1,%xmm0 0x00007f0ac0bd7a77: jmp 0x00007f0ac0bd7aa2 0x00007f0ac0bd7a79: nopl 0x0(%rax) 0x00007f0ac0bd7a80: subsd %xmm2,%xmm0 0x00007f0ac0bd7a84: xor $0x1,%r11d 0x00007f0ac0bd7a88: add $0x2,%r9d 0x00007f0ac0bd7a8c: cmp $0x3e8,%r9d 0x00007f0ac0bd7a93: jge 0x00007f0ac0bd7aad 0x00007f0ac0bd7a95: cvtsi2sd %r9d,%xmm3 0x00007f0ac0bd7a9a: movapd %xmm1,%xmm2 0x00007f0ac0bd7a9e: divsd %xmm3,%xmm2 0x00007f0ac0bd7aa2: test %r11d,%r11d 0x00007f0ac0bd7aa5: je 0x00007f0ac0bd7a80 0x00007f0ac0bd7aa7: addsd %xmm2,%xmm0 0x00007f0ac0bd7aab: jmp 0x00007f0ac0bd7a84 0x00007f0ac0bd7aad: mov 0x460(%r15),%rcx 0x00007f0ac0bd7ab4: mov %r11d,%r8d ; ImmutableOopMap {} ;*goto {reexecute=1 rethrow=0 return_oop=0} ; - (reexecute) test_perf::compute_pi@53 (line 7) 0x00007f0ac0bd7ab7: test %eax,(%rcx) ; {poll} 0x00007f0ac0bd7ab9: add $0x10,%rsp 0x00007f0ac0bd7abd: pop %rbp 0x00007f0ac0bd7abe: cmp 0x458(%r15),%rsp ; {poll_return} 0x00007f0ac0bd7ac5: ja 0x00007f0ac0bd7acc 0x00007f0ac0bd7acb: retq AVX1 ===== [Verified Entry Point] # {method} {0x00007ff3bb4003d8} 'compute_pi' '()D' in 'test_perf' # [sp+0x20] (sp of caller) 0x00007ff45c315540: sub $0x18,%rsp ; {no_reloc} 0x00007ff45c315547: mov %rbp,0x10(%rsp) 0x00007ff45c31554c: cmpl $0x1,0x20(%r15) 0x00007ff45c315554: jne 0x00007ff45c3155de 0x00007ff45c31555a: mov $0x3,%r9d 0x00007ff45c315560: vmovsd -0x48(%rip),%xmm1 # 0x00007ff45c315520 ; {section_word} 0x00007ff45c315568: vmovsd -0x48(%rip),%xmm2 # 0x00007ff45c315528 ; {section_word} 0x00007ff45c315570: xor %r11d,%r11d 0x00007ff45c315573: vmovapd %xmm1,%xmm0 0x00007ff45c315577: jmp 0x00007ff45c31559e 0x00007ff45c315579: nopl 0x0(%rax) 0x00007ff45c315580: vsubsd %xmm2,%xmm0,%xmm0 0x00007ff45c315584: xor $0x1,%r11d 0x00007ff45c315588: add $0x2,%r9d 0x00007ff45c31558c: cmp $0x3e8,%r9d 0x00007ff45c315593: jge 0x00007ff45c3155a9 0x00007ff45c315595: vcvtsi2sd %r9d,%xmm2,%xmm2 0x00007ff45c31559a: vdivsd %xmm2,%xmm1,%xmm2 0x00007ff45c31559e: test %r11d,%r11d 0x00007ff45c3155a1: je 0x00007ff45c315580 0x00007ff45c3155a3: vaddsd %xmm2,%xmm0,%xmm0 0x00007ff45c3155a7: jmp 0x00007ff45c315584 0x00007ff45c3155a9: mov 0x460(%r15),%rcx 0x00007ff45c3155b0: mov %r11d,%r8d ; ImmutableOopMap {} ;*goto {reexecute=1 rethrow=0 return_oop=0} ; - (reexecute) test_perf::compute_pi@53 (line 7) 0x00007ff45c3155b3: test %eax,(%rcx) ; {poll} 0x00007ff45c3155b5: add $0x10,%rsp 0x00007ff45c3155b9: pop %rbp 0x00007ff45c3155ba: cmp 0x458(%r15),%rsp ; {poll_return} 0x00007ff45c3155c1: ja 0x00007ff45c3155c8 0x00007ff45c3155c7: retq
26-10-2023

[~sviswanathan] please assign to someone to look on this.
19-10-2023