Bug ID: JDK-8356216 Regressions ~5% in MonteCarlo and AESBench

Type: Bug
Component: hotspot
Sub-Component: compiler
Affected Version: 25

Priority: P3
Status: Open
Resolution: Unresolved
CPU: x86_64

Submitted: 2025-05-05
Updated: 2025-05-19

JDK 25
25Unresolved

The MonteCarlo regressed on MacOSX x64 and AESBench on linux-x64. Both seem to related to JDK-8349139.

We are running scimark.monte_carlo -ikv with only -server -XX:+UseG1GC

Further down the profile, with CI 2394 (including the JDK-8349139) I see this one: 3.59% c2, level 4 com.sun.crypto.provider.CipherCore::fillOutputBuffer, version 2, compile id 683 but that does not appear in 2393. So it seems maybe JDK-8349139 has this coincidence of affect on compilation order? I can see it in the +LogCompilation: 3.417: 643 4 org.openjdk.bench.javax.crypto.small.jmh_generated.AESBench_decrypt_jmhTest::decrypt_thrpt_jmhStub (57 bytes)(code size: 8168) @ 7 java.lang.System::nanoTime (0 bytes) (intrinsic: _nanoTime) (end time: 0.0000) @ 17 org.openjdk.bench.javax.crypto.full.AESBench::decrypt succeed: force inline by CompileCommand (end time: 3.4240 nodes: 1441 live: 1370) @ 29 javax.crypto.Cipher::doFinal succeed: inline (hot) (end time: 3.4240 nodes: 1438 live: 1368) @ 1 javax.crypto.Cipher::checkCipherState succeed: inline (hot) (end time: 3.4190 nodes: 294 live: 283) @ 20 javax.crypto.Cipher::chooseFirstProvider succeed: inline (hot) (end time: 3.4190 nodes: 342 live: 328) @ 31 com.sun.crypto.provider.AESCipher::engineDoFinal succeed: inline (hot) (end time: 3.4240 nodes: 1435 live: 1366) @ 7 com.sun.crypto.provider.CipherCore::doFinal succeed: inline (hot) (end time: 3.4240 nodes: 1432 live: 1364) @ 3 com.sun.crypto.provider.CipherCore::getOutputSizeByOperation succeed: inline (hot) (end time: 3.4210 nodes: 467 live: 448) @ 7 java.lang.Math::addExact (26 bytes) (intrinsic: _addExactI) (end time: 0.0000) @ 17 com.sun.crypto.provider.CipherCore::prepareInputBuffer succeed: inline (hot) (end time: 3.4220 nodes: 678 live: 649) @ 5 java.lang.Math::addExact (26 bytes) (intrinsic: _addExactI) (end time: 0.0000) @ 61 com.sun.crypto.provider.CipherCore::fillOutputBuffer succeed: inline (hot) (end time: 3.4240 nodes: 1191 live: 1135) and with 2394 it does not get inlined: 3.429: 660 4 org.openjdk.bench.javax.crypto.small.jmh_generated.AESBench_decrypt_jmhTest::decrypt_thrpt_jmhStub (57 bytes)(code size: 4664) @ 7 java.lang.System::nanoTime (0 bytes) (intrinsic: _nanoTime) (end time: 0.0000) @ 17 org.openjdk.bench.javax.crypto.full.AESBench::decrypt succeed: force inline by CompileCommand (end time: 3.4330 nodes: 994 live: 948) @ 29 javax.crypto.Cipher::doFinal succeed: inline (hot) (end time: 3.4330 nodes: 991 live: 946) @ 1 javax.crypto.Cipher::checkCipherState succeed: inline (hot) (end time: 3.4300 nodes: 294 live: 283) @ 20 javax.crypto.Cipher::chooseFirstProvider succeed: inline (hot) (end time: 3.4310 nodes: 342 live: 328) @ 31 com.sun.crypto.provider.AESCipher::engineDoFinal succeed: inline (hot) (end time: 3.4330 nodes: 988 live: 944) @ 7 com.sun.crypto.provider.CipherCore::doFinal succeed: inline (hot) (end time: 3.4330 nodes: 985 live: 942) @ 3 com.sun.crypto.provider.CipherCore::getOutputSizeByOperation succeed: inline (hot) (end time: 3.4310 nodes: 467 live: 448) @ 7 java.lang.Math::addExact (26 bytes) (intrinsic: _addExactI) (end time: 0.0000) @ 17 com.sun.crypto.provider.CipherCore::prepareInputBuffer succeed: inline (hot) (end time: 3.4320 nodes: 678 live: 649) @ 5 java.lang.Math::addExact (26 bytes) (intrinsic: _addExactI) (end time: 0.0000) @ 61 com.sun.crypto.provider.CipherCore::fillOutputBuffer fail: already compiled into a big method (end time: 0.0000) What do you think? If JDK-8349139 causes better/different inlining in some other function, can we do anything with this? AFAICT doing -jvmArgs "-XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=inline,com.sun.crypto.provider.CipherCore::fillOutputBuffer" recovers the performance.
19-05-2025
When I run AESBench with prof perfasm, I get his: 53.34% runtime stub StubRoutines::cipherBlockChaining_decryptAESCrypt 20.87% c2, level 4 org.openjdk.bench.javax.crypto.small.jmh_generated.AESBench_decrypt_jmhTest::decrypt_thrpt_jmhStub, version 5, compile id 1007 11.81% runtime stub StubRoutines::cipherBlockChaining_decryptAESCrypt 4.75% runtime stub StubRoutines::cipherBlockChaining_decryptAESCrypt So most time is spent in a stub that's not affected by JDK-8349139 and little time is spent in c2 (20 %). Is it also the case where the regression is observed? Or is the stub not available on that platform? Is the regression of 5% here as well? To have 5% overall coming from compiled code where only 20% of the time is spent (the only code affected by JDK-8349139), that compiled code would have to massively slowed down.
16-05-2025
Hi [~roland] try it like this with the repo micros numactl --cpunodebind=0 --membind=0 -- ./jdk/bin/java -jar micro/benchmarks.jar crypto.small.AESBench.decrypt -p algorithm="AES/CBC/NoPadding" -f 4 -wi 6 On 1 node of the OCI BM O3.36, I get Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units AESBench.decrypt AES/CBC/NoPadding 1024 128 thrpt 32 9206646.420 ± 39732.389 ops/s before the change and this after it: Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units AESBench.decrypt AES/CBC/NoPadding 1024 128 thrpt 32 8624396.109 ± 34777.724 ops/s Let me know if you can repro that. Those systems have this cpu: model name : Intel(R) Xeon(R) Gold 6354 CPU @ 3.00GHz
13-05-2025
It seems unlikely JDK-8349139 causes the regression on MonteCarlo. The main loop where all time is spent according to perf on linux (but that shouldn't be platform dependent) is in `integrate()`: ``` for (int count = 0; count < numSamples; count++) { double x = R.nextDouble(); double y = R.nextDouble(); if (x * x + y * y <= 1.0) { underCurve++; } } ``` C2 doesn't create pre/main/post loops for this one. JDK-8349139 should only affect c2's behavior once pre/main/post loops are created. For AESBench: what are the values of the parameters for which a regression is observed?
13-05-2025
[~roland] could you have a look?
06-05-2025
ILW = Performance regressions in two benchmarks, two benchmarks on linux and macos, no workaround known = MMH = P3
06-05-2025