JDK-8262356 : Optimize existing masked operation support for AVX-512.
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 17
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • CPU: x86
  • Submitted: 2021-02-25
  • Updated: 2021-11-25
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8270349 :  
Description
- Currently a vector masked operation performs an operation over all the vector lanes followed by a blend operation which selectively updates the result vector under the influence of mask vector.

- Prior to AVX-512 blending newly computed result with older value was the only way to facilitate masked/predicated vector operations.

- A non-AVX-512 vector blend instruction probes the MSB bit for each mask vector lane in order to selectively choose between two source vector lanes.
 
- With AVX-512 there are two ways in which masked operation can be performed as follows
Method 1: 
       vmask = vector_cmp(mask, ALL_ONES)
       vres = vector_operation vsrc1, vsrc2 
       vector_blend(vdst,  vres,  vmask)
 
Method 2: 
      opmask = vector_cmp(mask, ALL_ONES)
      ves = vector_operation vsrc1, vsrc2, opmask

Clearly emitting a predicated vector operation is much more optimal in terms of emitted code size and is energy efficient since a vector operation conditionally operates over portion of vectors. 

- VectorAPI has significantly extended to scope of masked operations, additionally it offer APIs to perform direct mask manipulation e.g. VectorMask.or/and/not.  Thus a direct operation over an Opmask register will enable generating efficient code.

- Using opmask register we can further optimized existing implementation for VectorMask querying operation like VectorMask.firstTrue/lastTrue/anyTrue/allTrue/trueCount.