AArch64 dose *not* have right shift SIMD instruction. Given by this, an extra "vneg" is needed before each left shift to achieve the right one.
By Combing the "vneg" with RShiftCntV, those extra "vneg" could be saved.
Before:
0x0000ffffa9106c68: ldr q17, [x15, #16]
0x0000ffffa9106c6c: add x14, x10, x14
0x0000ffffa9106c70: neg v18.16b, v16.16b
0x0000ffffa9106c74: ushl v17.8h, v17.8h, v18.8h
0x0000ffffa9106c78: str q17, [x14, #16]
0x0000ffffa9106c7c: ldr q17, [x15, #32]
0x0000ffffa9106c80: neg v18.16b, v16.16b
0x0000ffffa9106c84: ushl v17.8h, v17.8h, v18.8h
0x0000ffffa9106c88: str q17, [x14, #32]
0x0000ffffa9106c8c: ldr q17, [x15, #48]
0x0000ffffa9106c90: neg v18.16b, v16.16b
0x0000ffffa9106c94: ushl v17.8h, v17.8h, v18.8h
0x0000ffffa9106c98: str q17, [x14, #48]
0x0000ffffa9106c9c: ldr q17, [x15, #64]
0x0000ffffa9106ca0: neg v18.16b, v16.16b
0x0000ffffa9106ca4: ushl v17.8h, v17.8h, v18.8h
0x0000ffffa9106ca8: str q17, [x14, #64]
0x0000ffffa9106cac: ldr q17, [x15, #80]
0x0000ffffa9106cb0: neg v18.16b, v16.16b
0x0000ffffa9106cb4: ushl v17.8h, v17.8h, v18.8h
After:
0x0000ffff81106af8: ldr q17, [x15, #16]
0x0000ffff81106afc: ushl v17.8h, v17.8h, v16.8h
0x0000ffff81106b00: add x14, x10, x14
0x0000ffff81106b04: str q17, [x14, #16]
0x0000ffff81106b08: ldr q17, [x15, #32]
0x0000ffff81106b0c: ushl v17.8h, v17.8h, v16.8h
0x0000ffff81106b10: str q17, [x14, #32]
0x0000ffff81106b14: ldr q17, [x15, #48]
0x0000ffff81106b18: ushl v17.8h, v17.8h, v16.8h
0x0000ffff81106b1c: str q17, [x14, #48]
0x0000ffff81106b20: ldr q17, [x15, #64]
0x0000ffff81106b24: ushl v17.8h, v17.8h, v16.8h
0x0000ffff81106b28: str q17, [x14, #64]
0x0000ffff81106b2c: ldr q17, [x15, #80]
0x0000ffff81106b30: ushl v17.8h, v17.8h, v16.8h
0x0000ffff81106b34: str q17, [x14, #80]
0x0000ffff81106b38: ldr q17, [x15, #96]
0x0000ffff81106b3c: ushl v17.8h, v17.8h, v16.8h
0x0000ffff81106b40: str q17, [x14, #96]
0x0000ffff81106b44: ldr q17, [x15, #112]
0x0000ffff81106b48: ushl v17.8h, v17.8h, v16.8h
0x0000ffff81106b4c: str q17, [x14, #112]
0x0000ffff81106b50: ldr q17, [x15, #128]
0x0000ffff81106b54: ushl v17.8h, v17.8h, v16.8h
0x0000ffff81106b58: str q17, [x14, #128]
AArch32 benefits from this way.