JDK-8351409 : Avoid scalar cmov in extreme long min/max loop scenarios
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 25
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2025-03-07
  • Updated: 2025-03-14
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Description
Solving JDK-8307513 with the PR https://github.com/openjdk/jdk/pull/20098 contains edge cases where performance degradation can be observed. These performance regressions can be summarised as follows:

Regression 1: Given a loop with a long min/max reduction pattern with one side of branch taken near 100% of time, when Supeword finds the pattern not profitable, then HotSpot will use scalar instructions (cmov) and performance will regress.

Regression 2: Given a loop with a long min/max reduction pattern with one side of branch near 100% of time, when the platform does not support vector instructions to achieve this (e.g. AVX-512 quad word vpmax/vpmin), then HotSpot will use scalar instructions (cmov) and performance will regress.

Regression 3: Given a loop with a long min/max non-reduction pattern (e.g. longLoopMax) with one side of branch taken near 100% of time, when the platform does not vectorize it (either lack of CPU instruction support, or Superword finding not profitable), then HotSpot will use scalar instructions (cmov) and performance will regress.

What all these regressions have in common is that in this extreme scenarios the compiler emits scalar cmov instructions. So, the idea to fix this would be to detect these extreme scenarios would be to use branching code (e.g. cmp + mov).
Comments
I've created JDK-8352082 for the int case.
14-03-2025

Re: int case I will try to compile a similar list of regressions and see how likely the circumstances are.
13-03-2025

Re: int case I think it should be a separate RFE because the circumstances under which it can happen are narrower. For example: vectorized int min/max support is available in a wider set of architectures because AVX-512 is not needed, and IIRC smaller registry sizes in aarch64 128bit registers still work with it. I think such RFE could do with listing the situations just like it's done for long above.
13-03-2025

ILW = Minor performance regression, in edge cases with CMOV, lower ConditionalMoveLimit (?) = MLM = P4
10-03-2025

[~galder] Would you also tackle the int case, or should we file a separate RFE for that? See: https://github.com/openjdk/jdk/pull/20098#issuecomment-2662706564
07-03-2025