JDK-8265263 : AArch64: Combine vneg with right shift count
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 17
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: aarch64
  • Submitted: 2021-04-15
  • Updated: 2022-10-20
  • Resolved: 2022-03-09
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 19
19 b13Fixed
Related Reports
Relates :  
Description
AArch64 dose *not* have right shift SIMD instruction. Given by this, an extra "vneg" is needed before each left shift to achieve the right one.

By Combing the "vneg" with RShiftCntV, those extra "vneg" could be saved. 

Before:
  0x0000ffffa9106c68:   ldr     q17, [x15, #16]
  0x0000ffffa9106c6c:   add     x14, x10, x14
  0x0000ffffa9106c70:   neg     v18.16b, v16.16b
  0x0000ffffa9106c74:   ushl    v17.8h, v17.8h, v18.8h
  0x0000ffffa9106c78:   str     q17, [x14, #16]
  0x0000ffffa9106c7c:   ldr     q17, [x15, #32]
  0x0000ffffa9106c80:   neg     v18.16b, v16.16b
  0x0000ffffa9106c84:   ushl    v17.8h, v17.8h, v18.8h
  0x0000ffffa9106c88:   str     q17, [x14, #32]
  0x0000ffffa9106c8c:   ldr     q17, [x15, #48]
  0x0000ffffa9106c90:   neg     v18.16b, v16.16b
  0x0000ffffa9106c94:   ushl    v17.8h, v17.8h, v18.8h
  0x0000ffffa9106c98:   str     q17, [x14, #48]
  0x0000ffffa9106c9c:   ldr     q17, [x15, #64]
  0x0000ffffa9106ca0:   neg     v18.16b, v16.16b
  0x0000ffffa9106ca4:   ushl    v17.8h, v17.8h, v18.8h
  0x0000ffffa9106ca8:   str     q17, [x14, #64]
  0x0000ffffa9106cac:   ldr     q17, [x15, #80]
  0x0000ffffa9106cb0:   neg     v18.16b, v16.16b
  0x0000ffffa9106cb4:   ushl    v17.8h, v17.8h, v18.8h

After:
  0x0000ffff81106af8:   ldr     q17, [x15, #16]
  0x0000ffff81106afc:   ushl    v17.8h, v17.8h, v16.8h
  0x0000ffff81106b00:   add     x14, x10, x14
  0x0000ffff81106b04:   str     q17, [x14, #16]
  0x0000ffff81106b08:   ldr     q17, [x15, #32]
  0x0000ffff81106b0c:   ushl    v17.8h, v17.8h, v16.8h
  0x0000ffff81106b10:   str     q17, [x14, #32]
  0x0000ffff81106b14:   ldr     q17, [x15, #48]
  0x0000ffff81106b18:   ushl    v17.8h, v17.8h, v16.8h
  0x0000ffff81106b1c:   str     q17, [x14, #48]
  0x0000ffff81106b20:   ldr     q17, [x15, #64]
  0x0000ffff81106b24:   ushl    v17.8h, v17.8h, v16.8h
  0x0000ffff81106b28:   str     q17, [x14, #64]
  0x0000ffff81106b2c:   ldr     q17, [x15, #80]
  0x0000ffff81106b30:   ushl    v17.8h, v17.8h, v16.8h
  0x0000ffff81106b34:   str     q17, [x14, #80]
  0x0000ffff81106b38:   ldr     q17, [x15, #96]
  0x0000ffff81106b3c:   ushl    v17.8h, v17.8h, v16.8h
  0x0000ffff81106b40:   str     q17, [x14, #96]
  0x0000ffff81106b44:   ldr     q17, [x15, #112]
  0x0000ffff81106b48:   ushl    v17.8h, v17.8h, v16.8h
  0x0000ffff81106b4c:   str     q17, [x14, #112]
  0x0000ffff81106b50:   ldr     q17, [x15, #128]
  0x0000ffff81106b54:   ushl    v17.8h, v17.8h, v16.8h
  0x0000ffff81106b58:   str     q17, [x14, #128] 

AArch32 benefits from this way.
Comments
A pull request was submitted for review. URL: https://git.openjdk.org/jdk17u-dev/pull/811 Date: 2022-10-20 08:14:59 +0000
20-10-2022

Changeset: 49245131 Author: Hao Sun <haosun@openjdk.org> Committer: Pengfei Li <pli@openjdk.org> Date: 2022-03-09 00:52:01 +0000 URL: https://git.openjdk.java.net/jdk/commit/49245131e98c1c72c447536e5527acecb3311add
09-03-2022

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk/pull/7724 Date: 2022-03-07 08:46:12 +0000
07-03-2022

[~dlong] Thanks for your comments. As we discussed in https://github.com/openjdk/jdk18/pull/41, we may want to improve right shifts for aarch64 from the following two aspects: 1. use is_var_shift to tell where to put the vneg operation, either the ShiftCntV or RSfhiftV. 2. introduce either new operand types or platform-specific negate IR, as suggested by Dean Long. The idea is to reuse the same negated values, and hence some performance benefits are expected as well. Besides, we should take care of the impacts on arm32 platform.
05-01-2022

To do this right, you need to be careful of breaking the Vector API (see JDK-8278267) and JDK-8277239. To simplify things, it might make sense to do the platform-specific negate in the IR instead of the backend. We already have precedents for this, masking the shift value in the bytecodes rather than in the backend: https://github.com/openjdk/jdk/blob/0113322ac15e2441def3dec599199b98cbd02961/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/IntVector.java#L658
23-12-2021