Bug ID: JDK-8294215 CallSiteTargetSelf microbenchmarks regression due changes in nmethod unloading behavior

JDK-8294215 : CallSiteTargetSelf microbenchmarks regression due changes in nmethod unloading behavior

Type: Bug
Component: hotspot
Sub-Component: compiler
Affected Version: 11,17,19,20

Priority: P3
Status: Open
Resolution: Unresolved

Submitted: 2022-09-22
Updated: 2022-09-23

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

Other
tbdUnresolved

Related Reports

Relates :

JDK-8233873 - final field values should be trusted as constant

Description

Scores on CallSiteTargetSelf.test* microbenchmarks has regressed compared to JDK 8 and exhibit a behavior where the score gets progressively worse the longer it's run:

# Benchmark: org.openjdk.bench.java.lang.invoke.CallSiteSetTargetSelf.testMutable

11.0.16:

# Run progress: 0,00% complete, ETA 00:16:40
# Fork: 1 of 5
# Warmup Iteration   1: 1943,091 ns/op
# Warmup Iteration   2: 4190,466 ns/op
# Warmup Iteration   3: 5090,832 ns/op
# Warmup Iteration   4: 6211,476 ns/op
# Warmup Iteration   5: 6803,869 ns/op
Iteration   1: 7338,019 ns/op
Iteration   2: 8029,986 ns/op
Iteration   3: 8704,329 ns/op
Iteration   4: 9223,633 ns/op
Iteration   5: 10498,911 ns/op

8.0.345:

# Run progress: 0,00% complete, ETA 00:16:40
# Fork: 1 of 5
# Warmup Iteration   1: 326,132 ns/op
# Warmup Iteration   2: 317,761 ns/op
# Warmup Iteration   3: 315,248 ns/op
# Warmup Iteration   4: 319,478 ns/op
# Warmup Iteration   5: 317,688 ns/op
Iteration   1: 324,402 ns/op
Iteration   2: 314,060 ns/op
Iteration   3: 327,899 ns/op
Iteration   4: 324,006 ns/op
Iteration   5: 319,762 ns/op

[~vlivanov] did some analysis and showed that this is due to a growing number of dependent nmethods that has to be checked every time the CallSite target changes. JDK 8 more aggressively unload nmethods, and a similar behavior can be provoked on 8 by disabling method unloading (-XX:-MethodFlushing):

# Run progress: 0,00% complete, ETA 00:16:40
# Fork: 1 of 5
# Warmup Iteration   1: 2509,419 ns/op
# Warmup Iteration   2: 3808,516 ns/op
# Warmup Iteration   3: 4496,784 ns/op
# Warmup Iteration   4: 5301,225 ns/op
# Warmup Iteration   5: 5607,454 ns/op

Conversely, forcing unloads to happen more often in the microbenchmarks by limiting the code cache makes 11+ exhibit better behavior on the micro:

11.0.16 with -XX:ReservdCodeCacheSize=3m

# Run progress: 0,00% complete, ETA 00:16:40
# Fork: 1 of 5
# Warmup Iteration   1: 262,584 ns/op
# Warmup Iteration   2: 262,319 ns/op
# Warmup Iteration   3: 257,795 ns/op
# Warmup Iteration   4: 257,002 ns/op
# Warmup Iteration   5: 258,168 ns/op

It's unclear how much of a problem this is in practice, but applications relying on mutable CallSites are susceptible and this would be an annoying performance issue in production since application performance might depend heavily on nmethod unloading happen in a timely manner or not [~vlivanov] has suggested we might be able to clean dependency contexts proactively before relevant nmethods get unloaded.

Comments

There's no point in keeping a separate list per MethodHandle instance: once CallSite.setTarget() successfully returns there should be no nmethods left which have the previous target inlined. In that sense, a change in CallSite.target completely wipes the dependency context. The performance problem comes from the fact that nmethodBuckets are kept in place for stale nmethod dependencies until the relevant nmethod goes away. It can be improved by wiping the whole dependency context associated with the CallSite instance all at once. CallSite.target case does look like a special case of "truly final" except that the JVM has to keep it "optimizable" irrespective of how many times it failed. That was an explicit requirement from multiple implementors of JVM language runtimes when it comes to mutable CallSites.
23-09-2022
It occurs to be that this speculation on the value of CallSite.target is a special case of the more general idea of speculation on the semi-stable field of a constant object. So any implementation of "truly final" fields could potentially be adapted for CallSites, or perhaps even an improved and generalized solution for CallSites could be used for final fields.
23-09-2022
Some observations: 1. It seems we invalidate all nmethods using the CallSite, regardless of the target. Having a dependency list per (CallSite, MethodHandle) pair might help here. We would need to be careful to read the old target reliably in CallSite.setTarget() using something like compare-and-swap. 2. The CallSite target could be changed from A to B and back to A. In that case, nmethods with a dependency on A would only need to be invalidated if they ran while the CallSite was set to B. An entry barrier might help here. 3. When we invalidate nmethods, we make them not_entrant, but we leave them on the dependency list. Removing them from the list (barring concurrency issues) might help
23-09-2022
ILW = perf regression; microbenchmark; no workaround = MMH = P3
23-09-2022
Though java.lang.invoke.CallSite case is probably the most affected one, the problem is not specific to java.lang.invoke. Reclassifying as hotspot/compiler.
23-09-2022
The regression is caused by changes in nmethod sweeping behavior. If I disable nmethod unloading, jdk8 demonstrates the very same problem as the mainline: $ jdk1.8.0_331/bin/java -jar benchmarks.jar -f 1 -jvmArgs "-XX:+MethodFlushing" CallSiteSetTargetSelf.testMutable CallSiteSetTargetSelf.testMutable avgt 5 657.847 ± 34.332 ns/op $ jdk1.8.0_331/bin/java -jar benchmarks.jar -f 1 -jvmArgs "-XX:-MethodFlushing" CallSiteSetTargetSelf.testMutable CallSiteSetTargetSelf.testMutable avgt 5 9996.021 ± 251.603 ns/op The benchmark is sensitive to the amount of nmethod dependencies: on every call site target change, the JVM has to traverse relevant dependency context. If invalidated nmethods aren't promptly unloaded, the context grows and it becomes more and more expensive to change call site target.
23-09-2022