JDK-8340093 : C2 SuperWord: implement cost model
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 24
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2024-09-13
  • Updated: 2025-08-29
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Blocks :  
Blocks :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8346993 :  
JDK-8366357 :  
JDK-8366361 :  
JDK-8366427 :  
Description
This will improve the profitability of vectorizing reductions, and adding shuffle/pack/unpack operations.
Because vectorization is not always profitable, especially if we add more operations to the loop.

There may also be extra cost to subword conversion, see:
https://github.com/openjdk/jdk/pull/23413

--------------------------- PLAN ----------------------

I have a proof-of-concept patch here:
https://github.com/openjdk/jdk/pull/20964

Instead of pushing it as a whole (quite unreviewable), I'll split it up into subtasks.

Here a rough schedule towards Cost-Modeling:
0. Smaller refactorings
1. Scalar node refactoring
  - Finer resolution: mem, phi, data, cfg
  - These will be needed when modeling the whole loop instead of just the basic block (step 3)
2. Vector node refactoring
  - remove reliance on _nodes , so that it will be easier to model the whole loop (step 3)
  - instead capture all relevant information in some sort of VTransformNodePrototpye : opcode, vlen, basic_type, etc.
3. Model whole loop instead of only basic block (allows VTransform optimizations like moving reduction out of loop)
  - Instead of VTransformGraph::apply_memops_reordering_with_schedule  that reorders the old graph,
  - I want to build the new loop body from the VTransform directly.
  - That means we are less constrained by the old shape of the loop.
4. Optimize: e.g. move reduction out of loop
  - Refactor move_unordered_reduction_out_of_loop
  - Moving the reduction out of the loop will mean it is not counted in the cost any more, and it is now more profitable (see step 5)
5. Cost-model
  - count scalar loop cost (via scalar opcodes)
  - count vector loop cost (via scalar opcodes, and vector opcodes + vlen)
    - keep track of live nodes (optimization might kill some)
    - keep track of nodes inside loop (optimizations might float some nodes out of the loop, don't count their cost)