JDK-8356216 : Regressions ~5% in MonteCarlo and AESBench
Type:Bug
Component:hotspot
Sub-Component:compiler
Affected Version:25
Priority:P3
Status:Open
Resolution:Unresolved
CPU:x86_64
Submitted:2025-05-05
Updated:2025-05-19
The Version table provides details related to the release that this issue/RFE will be addressed.
Unresolved : Release in which this issue/RFE will be addressed. Resolved: Release in which this issue/RFE has been resolved. Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.
The MonteCarlo regressed on MacOSX x64 and AESBench on linux-x64. Both seem to related to JDK-8349139.
We are running scimark.monte_carlo -ikv with only -server -XX:+UseG1GC
Comments
Further down the profile, with CI 2394 (including the JDK-8349139) I see this one:
3.59% c2, level 4 com.sun.crypto.provider.CipherCore::fillOutputBuffer, version 2, compile id 683
but that does not appear in 2393. So it seems maybe JDK-8349139 has this coincidence of affect on compilation order? I can see it in the +LogCompilation:
3.417: 643 4 org.openjdk.bench.javax.crypto.small.jmh_generated.AESBench_decrypt_jmhTest::decrypt_thrpt_jmhStub (57 bytes)(code size: 8168)
@ 7 java.lang.System::nanoTime (0 bytes) (intrinsic: _nanoTime) (end time: 0.0000)
@ 17 org.openjdk.bench.javax.crypto.full.AESBench::decrypt succeed: force inline by CompileCommand (end time: 3.4240 nodes: 1441 live: 1370)
@ 29 javax.crypto.Cipher::doFinal succeed: inline (hot) (end time: 3.4240 nodes: 1438 live: 1368)
@ 1 javax.crypto.Cipher::checkCipherState succeed: inline (hot) (end time: 3.4190 nodes: 294 live: 283)
@ 20 javax.crypto.Cipher::chooseFirstProvider succeed: inline (hot) (end time: 3.4190 nodes: 342 live: 328)
@ 31 com.sun.crypto.provider.AESCipher::engineDoFinal succeed: inline (hot) (end time: 3.4240 nodes: 1435 live: 1366)
@ 7 com.sun.crypto.provider.CipherCore::doFinal succeed: inline (hot) (end time: 3.4240 nodes: 1432 live: 1364)
@ 3 com.sun.crypto.provider.CipherCore::getOutputSizeByOperation succeed: inline (hot) (end time: 3.4210 nodes: 467 live: 448)
@ 7 java.lang.Math::addExact (26 bytes) (intrinsic: _addExactI) (end time: 0.0000)
@ 17 com.sun.crypto.provider.CipherCore::prepareInputBuffer succeed: inline (hot) (end time: 3.4220 nodes: 678 live: 649)
@ 5 java.lang.Math::addExact (26 bytes) (intrinsic: _addExactI) (end time: 0.0000)
@ 61 com.sun.crypto.provider.CipherCore::fillOutputBuffer succeed: inline (hot) (end time: 3.4240 nodes: 1191 live: 1135)
and with 2394 it does not get inlined:
3.429: 660 4 org.openjdk.bench.javax.crypto.small.jmh_generated.AESBench_decrypt_jmhTest::decrypt_thrpt_jmhStub (57 bytes)(code size: 4664)
@ 7 java.lang.System::nanoTime (0 bytes) (intrinsic: _nanoTime) (end time: 0.0000)
@ 17 org.openjdk.bench.javax.crypto.full.AESBench::decrypt succeed: force inline by CompileCommand (end time: 3.4330 nodes: 994 live: 948)
@ 29 javax.crypto.Cipher::doFinal succeed: inline (hot) (end time: 3.4330 nodes: 991 live: 946)
@ 1 javax.crypto.Cipher::checkCipherState succeed: inline (hot) (end time: 3.4300 nodes: 294 live: 283)
@ 20 javax.crypto.Cipher::chooseFirstProvider succeed: inline (hot) (end time: 3.4310 nodes: 342 live: 328)
@ 31 com.sun.crypto.provider.AESCipher::engineDoFinal succeed: inline (hot) (end time: 3.4330 nodes: 988 live: 944)
@ 7 com.sun.crypto.provider.CipherCore::doFinal succeed: inline (hot) (end time: 3.4330 nodes: 985 live: 942)
@ 3 com.sun.crypto.provider.CipherCore::getOutputSizeByOperation succeed: inline (hot) (end time: 3.4310 nodes: 467 live: 448)
@ 7 java.lang.Math::addExact (26 bytes) (intrinsic: _addExactI) (end time: 0.0000)
@ 17 com.sun.crypto.provider.CipherCore::prepareInputBuffer succeed: inline (hot) (end time: 3.4320 nodes: 678 live: 649)
@ 5 java.lang.Math::addExact (26 bytes) (intrinsic: _addExactI) (end time: 0.0000)
@ 61 com.sun.crypto.provider.CipherCore::fillOutputBuffer fail: already compiled into a big method (end time: 0.0000)
What do you think? If JDK-8349139 causes better/different inlining in some other function, can we do anything with this?
AFAICT doing -jvmArgs "-XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=inline,com.sun.crypto.provider.CipherCore::fillOutputBuffer" recovers the performance.
19-05-2025
When I run AESBench with prof perfasm, I get his:
53.34% runtime stub StubRoutines::cipherBlockChaining_decryptAESCrypt
20.87% c2, level 4 org.openjdk.bench.javax.crypto.small.jmh_generated.AESBench_decrypt_jmhTest::decrypt_thrpt_jmhStub, version 5, compile id 1007
11.81% runtime stub StubRoutines::cipherBlockChaining_decryptAESCrypt
4.75% runtime stub StubRoutines::cipherBlockChaining_decryptAESCrypt
So most time is spent in a stub that's not affected by JDK-8349139 and little time is spent in c2 (20 %). Is it also the case where the regression is observed? Or is the stub not available on that platform? Is the regression of 5% here as well? To have 5% overall coming from compiled code where only 20% of the time is spent (the only code affected by JDK-8349139), that compiled code would have to massively slowed down.
16-05-2025
Hi [~roland] try it like this with the repo micros
numactl --cpunodebind=0 --membind=0 -- ./jdk/bin/java -jar micro/benchmarks.jar crypto.small.AESBench.decrypt -p algorithm="AES/CBC/NoPadding" -f 4 -wi 6
On 1 node of the OCI BM O3.36, I get
Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
AESBench.decrypt AES/CBC/NoPadding 1024 128 thrpt 32 9206646.420 ± 39732.389 ops/s
before the change and this after it:
Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
AESBench.decrypt AES/CBC/NoPadding 1024 128 thrpt 32 8624396.109 ± 34777.724 ops/s
Let me know if you can repro that. Those systems have this cpu:
model name : Intel(R) Xeon(R) Gold 6354 CPU @ 3.00GHz
13-05-2025
It seems unlikely JDK-8349139 causes the regression on MonteCarlo. The main loop where all time is spent according to perf on linux (but that shouldn't be platform dependent) is in `integrate()`:
```
for (int count = 0; count < numSamples; count++) {
double x = R.nextDouble();
double y = R.nextDouble();
if (x * x + y * y <= 1.0) {
underCurve++;
}
}
```
C2 doesn't create pre/main/post loops for this one. JDK-8349139 should only affect c2's behavior once pre/main/post loops are created.
For AESBench: what are the values of the parameters for which a regression is observed?
13-05-2025
[~roland] could you have a look?
06-05-2025
ILW = Performance regressions in two benchmarks, two benchmarks on linux and macos, no workaround known = MMH = P3