JDK-8279621 : x86: Arraycopy stubs should use 256-bit copies with AVX=1
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 8,11,17,18,19
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2022-01-07
  • Updated: 2022-01-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 19
19Unresolved
Related Reports
Relates :  
Relates :  
Description
While working on JDK-8150730 and looking at performance results for it, I noticed a pecularity in current arraycopy implementation. It looks as if changing UseAVX from 0 to 1 does not improve the baseline scores:
  https://cr.openjdk.java.net/~shade/8150730/i11500.png
  https://cr.openjdk.java.net/~shade/8150730/tr3970x.png

The problem is that the arraycopy generators use vmovdqu only for UseAVX >= 2:

      if (UseAVX >= 2) {
        __ vmovdqu(xmm0, Address(end_from, qword_count, Address::times_8, -56));
        ...
      } else {
        __ movdqu(xmm0, Address(end_from, qword_count, Address::times_8, -56));
        ...
      }

...while 256-bit vmovdqu is actually available for plain AVX(1) as well (matches VEX.256 encoding, as per Intel SDM):

  // Move Unaligned 256bit Vector
  void vmovdqu(Address dst, XMMRegister src);
  void vmovdqu(XMMRegister dst, Address src);
  void vmovdqu(XMMRegister dst, XMMRegister src);

Seems to be that way since the initial implementation in JDK-8005544.

Relaxing the requirement to UseAVX=1 in that code provides substantial performance improvements:
  https://github.com/openjdk/jdk/pull/6987
Comments
Incidental, I think. That code also changed generate_fill() which used vpbroadcastd AVX2 instruction. I am fine with your proposal.
07-01-2022

[~kvn], do you remember why the original code required UseAVX >= 2 for these?
07-01-2022