Currently (after the integration of JDK-8287087), reduction analysis for superword vectorization works by searching for reduction chains that follow the same input index, modulo edges that are swapped when canonicalizing commutative nodes (see swap_edges(1, 2) calls in addnode.cpp and mulnode.cpp). This is relatively fast and covers most of the reductions found in practice, but requires keeping track of whether the input edges of a commutative node have been swapped (using the Node::Flag_has_swapped_edges flag). Forgetting to update this flag when the inputs of a commutative node are swapped can result in missing vectorization opportunities.
A simpler alternative, to be explored in this RFE, is to further generalize reduction analysis into a complete depth- or breadth-first search that explores both inputs of every potential reduction node. This has the advantage of being conceptually simpler, more modular (not relying on swapped edge state), and slightly more complete (being able to find reductions on partially unrolled loops with user-swapped inputs), at the possible expense of higher computational cost (see qualitative comparison in https://github.com/openjdk/jdk/pull/13120#issue-1634092373 and further discussion in the review of JDK-8287087). A specific aspect that should be evaluated is the cost of searching during the x64 matching phase through long chains of floating-point Min/Max nodes, since reduction analysis is performed for each individual node in the chain.