Bug ID: JDK-8365178 Regression in Crypto-CC20P1305Bench.decrypt on win-x64

Type: Bug
Component: hotspot
Sub-Component: compiler
Affected Version: 26

Priority: P4
Status: Closed
Resolution: Not an Issue
OS: windows
CPU: x86_64

Submitted: 2025-08-08
Updated: 2025-09-12
Resolved: 2025-09-12

JDK 26
26Resolved

The sub-benchmark measure:
    openjdk.bench.javax.crypto.small.CC20P1305Bench.decrypt-dataSize:1024-provider
shows a regression in 26-b8:

~ -2% on win-x64

Regression was isolated to jdk-26+8-774, which only contains JDK-8342692

Crypto-CC20P1305Bench.decrypt benchmark is available from the JDK repo.

Ran with:
-jar benchmarks.jar "javax.crypto.small.CC20P1305Bench.decrypt"

[~ecaspole] yes, ok with me.
12-09-2025
I agree this benchmark spends so much time in stubs, it seems more likely that this change is causing a side effect of slightly more cache misses/alignment/inlining and further, the perf mostly or completely recovered in later builds. We are fine to close this no change, ok with you [~roland] ?
28-08-2025
I don't follow your statement... "For 3-, the previous logic was kept to cross check the result of the new logic for int loops. So if there was a bug there, I would expect it would have been caught by now." That seems like a functional cross check... not a performance cross-check.
27-08-2025
When I run that benchmark with perfasm I see this: ....[Hottest Methods (after inlining)].............................................................. 28.26% runtime stub StubRoutines::Stub Generator chacha20Block_stub 20.59% c2, level 4 javax.crypto.Cipher::doFinal, version 2, compile id 1106 13.41% runtime stub StubRoutines::Stub Generator jlong_disjoint_arraycopy_stub 12.35% c2, level 4 javax.crypto.Cipher::init, version 2, compile id 1096 7.59% runtime stub StubRoutines::Stub Generator poly1305_processBlocks_stub 7.22% c2, level 4 sun.security.util.math.intpoly.IntegerPolynomial::addLimbsModPowerTwo, version 2, compile id 1089 2.96% c2, level 4 com.sun.crypto.provider.ChaCha20Cipher$EngineAEADDec::doUpdate, version 2, compile id 1085 A lot of time is spent in stubs (so a regression in compiled code seems unlikely to be what we measure here). Is it the case when run on the system where the regression is observed as well? There are 3 main things that are changed by JDK-8342692: 1- It changes handling of long counted loops/long range checks 2- there's some logic for CastLL that was added 3- the trip count computation was modified to work for long loops as well as int loops 1- only has an impact if the test indeed uses long counted loops/long range checks. I added a ShouldNotReachHere(); in PhaseIdealLoop::try_make_short_running_loop() 2- could have an impact even without long counted loops/long range checks (but CastLL nodes are rare so that seems unlikely). Anyway, I added a ShouldNotReachHere(); above the returns for true in CastLLNode::used_at_inner_loop_exit_test() I then ran the benchmark again and the ShouldNotReachHere()s were not hit. For 3-, the previous logic was kept to cross check the result of the new logic for int loops. So if there was a bug there, I would expect it would have been caught by now. I also ran the benchmark with a fastdebug build and it runs fine. So, AFAICT, it doesn't seem possible for JDK-8342692 to create a regression with that benchmark.
27-08-2025
[~roland] could you have a look?
11-08-2025
ILW = Small performance regression, single benchmark, no workaround = MLH = P4
11-08-2025