Other |
---|
tbdUnresolved |
Blocks :
|
|
Blocks :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
JDK-8346993 :
|
|
JDK-8366357 :
|
|
JDK-8366361 :
|
|
JDK-8366427 :
|
This will improve the profitability of vectorizing reductions, and adding shuffle/pack/unpack operations. Because vectorization is not always profitable, especially if we add more operations to the loop. There may also be extra cost to subword conversion, see: https://github.com/openjdk/jdk/pull/23413 --------------------------- PLAN ---------------------- I have a proof-of-concept patch here: https://github.com/openjdk/jdk/pull/20964 Instead of pushing it as a whole (quite unreviewable), I'll split it up into subtasks. Here a rough schedule towards Cost-Modeling: 0. Smaller refactorings 1. Scalar node refactoring - Finer resolution: mem, phi, data, cfg - These will be needed when modeling the whole loop instead of just the basic block (step 3) 2. Vector node refactoring - remove reliance on _nodes , so that it will be easier to model the whole loop (step 3) - instead capture all relevant information in some sort of VTransformNodePrototpye : opcode, vlen, basic_type, etc. 3. Model whole loop instead of only basic block (allows VTransform optimizations like moving reduction out of loop) - Instead of VTransformGraph::apply_memops_reordering_with_schedule that reorders the old graph, - I want to build the new loop body from the VTransform directly. - That means we are less constrained by the old shape of the loop. 4. Optimize: e.g. move reduction out of loop - Refactor move_unordered_reduction_out_of_loop - Moving the reduction out of the loop will mean it is not counted in the cost any more, and it is now more profitable (see step 5) 5. Cost-model - count scalar loop cost (via scalar opcodes) - count vector loop cost (via scalar opcodes, and vector opcodes + vlen) - keep track of live nodes (optimization might kill some) - keep track of nodes inside loop (optimizations might float some nodes out of the loop, don't count their cost)