The MonteCarlo regressed on MacOSX x64 and AESBench on linux-x64. Both seem to related to JDK-8349139.
We are running scimark.monte_carlo -ikv with only -server -XX:+UseG1GC
Comments
Yes since the problem is understood, it's fine with me to close it.
02-12-2025
Given we agree that the regression is an accident and not directly caused by JDK-8349139, [~ecaspole] do you agree we can close it as wnf?
26-11-2025
So it seems like JDK-8349139 has a side effect on the inlining heuristic, which is probably not something we are going to fix in JDK 25. I'm tentatively setting the fix version to JDK 26 for this.
> The actual fix for this would be to get rid of this particular heuristic (because it is known to make performance fluctuate quite a bit in some cases if not lucky with timing or compiled code size) and replace it with another one that would be more robust.
Any ideas for improvement of the heuristic?
03-06-2025
Failure with " already compiled into a big method" happens because the method was already compiled and code for that method (in number of bytes of actual instructions) is above some threshold.
That method not being inlined in 2393 implies that either:
1) it wasn't compiled by the time the inlining decision is made
2) it was compiled but code was small enough
So it could be a timing issue (1). If that's the case, it could be JDK-8349139 has some impact on timings of compilations.
Or it could be a compiled code size issue (2).
Either ones are hard to act on (do we really want to penalize correctness or performance to push code size below some threshold?) and they feel more like being unlucky than something really going wrong.
The actual fix for this would be to get rid of this particular heuristic (because it is known to make performance fluctuate quite a bit in some cases if not lucky with timing or compiled code size) and replace it with another one that would be more robust.
21-05-2025
Further down the profile, with CI 2394 (including the JDK-8349139) I see this one:
3.59% c2, level 4 com.sun.crypto.provider.CipherCore::fillOutputBuffer, version 2, compile id 683
but that does not appear in 2393. So it seems maybe JDK-8349139 has this coincidence of affect on compilation order? I can see it in the +LogCompilation:
3.417: 643 4 org.openjdk.bench.javax.crypto.small.jmh_generated.AESBench_decrypt_jmhTest::decrypt_thrpt_jmhStub (57 bytes)(code size: 8168)
@ 7 java.lang.System::nanoTime (0 bytes) (intrinsic: _nanoTime) (end time: 0.0000)
@ 17 org.openjdk.bench.javax.crypto.full.AESBench::decrypt succeed: force inline by CompileCommand (end time: 3.4240 nodes: 1441 live: 1370)
@ 29 javax.crypto.Cipher::doFinal succeed: inline (hot) (end time: 3.4240 nodes: 1438 live: 1368)
@ 1 javax.crypto.Cipher::checkCipherState succeed: inline (hot) (end time: 3.4190 nodes: 294 live: 283)
@ 20 javax.crypto.Cipher::chooseFirstProvider succeed: inline (hot) (end time: 3.4190 nodes: 342 live: 328)
@ 31 com.sun.crypto.provider.AESCipher::engineDoFinal succeed: inline (hot) (end time: 3.4240 nodes: 1435 live: 1366)
@ 7 com.sun.crypto.provider.CipherCore::doFinal succeed: inline (hot) (end time: 3.4240 nodes: 1432 live: 1364)
@ 3 com.sun.crypto.provider.CipherCore::getOutputSizeByOperation succeed: inline (hot) (end time: 3.4210 nodes: 467 live: 448)
@ 7 java.lang.Math::addExact (26 bytes) (intrinsic: _addExactI) (end time: 0.0000)
@ 17 com.sun.crypto.provider.CipherCore::prepareInputBuffer succeed: inline (hot) (end time: 3.4220 nodes: 678 live: 649)
@ 5 java.lang.Math::addExact (26 bytes) (intrinsic: _addExactI) (end time: 0.0000)
@ 61 com.sun.crypto.provider.CipherCore::fillOutputBuffer succeed: inline (hot) (end time: 3.4240 nodes: 1191 live: 1135)
and with 2394 it does not get inlined:
3.429: 660 4 org.openjdk.bench.javax.crypto.small.jmh_generated.AESBench_decrypt_jmhTest::decrypt_thrpt_jmhStub (57 bytes)(code size: 4664)
@ 7 java.lang.System::nanoTime (0 bytes) (intrinsic: _nanoTime) (end time: 0.0000)
@ 17 org.openjdk.bench.javax.crypto.full.AESBench::decrypt succeed: force inline by CompileCommand (end time: 3.4330 nodes: 994 live: 948)
@ 29 javax.crypto.Cipher::doFinal succeed: inline (hot) (end time: 3.4330 nodes: 991 live: 946)
@ 1 javax.crypto.Cipher::checkCipherState succeed: inline (hot) (end time: 3.4300 nodes: 294 live: 283)
@ 20 javax.crypto.Cipher::chooseFirstProvider succeed: inline (hot) (end time: 3.4310 nodes: 342 live: 328)
@ 31 com.sun.crypto.provider.AESCipher::engineDoFinal succeed: inline (hot) (end time: 3.4330 nodes: 988 live: 944)
@ 7 com.sun.crypto.provider.CipherCore::doFinal succeed: inline (hot) (end time: 3.4330 nodes: 985 live: 942)
@ 3 com.sun.crypto.provider.CipherCore::getOutputSizeByOperation succeed: inline (hot) (end time: 3.4310 nodes: 467 live: 448)
@ 7 java.lang.Math::addExact (26 bytes) (intrinsic: _addExactI) (end time: 0.0000)
@ 17 com.sun.crypto.provider.CipherCore::prepareInputBuffer succeed: inline (hot) (end time: 3.4320 nodes: 678 live: 649)
@ 5 java.lang.Math::addExact (26 bytes) (intrinsic: _addExactI) (end time: 0.0000)
@ 61 com.sun.crypto.provider.CipherCore::fillOutputBuffer fail: already compiled into a big method (end time: 0.0000)
What do you think? If JDK-8349139 causes better/different inlining in some other function, can we do anything with this?
AFAICT doing -jvmArgs "-XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=inline,com.sun.crypto.provider.CipherCore::fillOutputBuffer" recovers the performance.
19-05-2025
When I run AESBench with prof perfasm, I get his:
53.34% runtime stub StubRoutines::cipherBlockChaining_decryptAESCrypt
20.87% c2, level 4 org.openjdk.bench.javax.crypto.small.jmh_generated.AESBench_decrypt_jmhTest::decrypt_thrpt_jmhStub, version 5, compile id 1007
11.81% runtime stub StubRoutines::cipherBlockChaining_decryptAESCrypt
4.75% runtime stub StubRoutines::cipherBlockChaining_decryptAESCrypt
So most time is spent in a stub that's not affected by JDK-8349139 and little time is spent in c2 (20 %). Is it also the case where the regression is observed? Or is the stub not available on that platform? Is the regression of 5% here as well? To have 5% overall coming from compiled code where only 20% of the time is spent (the only code affected by JDK-8349139), that compiled code would have to massively slowed down.
16-05-2025
Hi [~roland] try it like this with the repo micros
numactl --cpunodebind=0 --membind=0 -- ./jdk/bin/java -jar micro/benchmarks.jar crypto.small.AESBench.decrypt -p algorithm="AES/CBC/NoPadding" -f 4 -wi 6
On 1 node of the OCI BM O3.36, I get
Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
AESBench.decrypt AES/CBC/NoPadding 1024 128 thrpt 32 9206646.420 ± 39732.389 ops/s
before the change and this after it:
Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
AESBench.decrypt AES/CBC/NoPadding 1024 128 thrpt 32 8624396.109 ± 34777.724 ops/s
Let me know if you can repro that. Those systems have this cpu:
model name : Intel(R) Xeon(R) Gold 6354 CPU @ 3.00GHz
13-05-2025
It seems unlikely JDK-8349139 causes the regression on MonteCarlo. The main loop where all time is spent according to perf on linux (but that shouldn't be platform dependent) is in `integrate()`:
```
for (int count = 0; count < numSamples; count++) {
double x = R.nextDouble();
double y = R.nextDouble();
if (x * x + y * y <= 1.0) {
underCurve++;
}
}
```
C2 doesn't create pre/main/post loops for this one. JDK-8349139 should only affect c2's behavior once pre/main/post loops are created.
For AESBench: what are the values of the parameters for which a regression is observed?
13-05-2025
[~roland] could you have a look?
06-05-2025
ILW = Performance regressions in two benchmarks, two benchmarks on linux and macos, no workaround known = MMH = P3