JDK-8318893 : Bad performance of VarHandles
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.lang.invoke
  • Affected Version: 21.0.1
  • Priority: P4
  • Status: Resolved
  • Resolution: Duplicate
  • OS: generic
  • CPU: generic
  • Submitted: 2023-10-18
  • Updated: 2023-11-06
  • Resolved: 2023-10-31
Related Reports
Duplicate :  
Description
A DESCRIPTION OF THE PROBLEM :
Here is a simple JMH benchmark:

```
import java.lang.invoke.MethodHandles;
import java.lang.invoke.VarHandle;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;
import java.util.stream.Stream;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
@Fork(value = 1 /*, jvmArgsAppend = {"-XX:+UnlockDiagnosticVMOptions", "-Xlog:class+load=info", "-XX:+LogCompilation", "-XX:+PrintAssembly", "-XX:+DebugNonSafepoints"}*/)
@Warmup(iterations = 2, time = 6)
@Measurement(iterations = 2, time = 10)
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
public class Bench {
    @State(Scope.Benchmark)
    public static class BenchState {
        public static final VarHandle AGE_DIRECT_VAR_HANDLE;
        public static final IntVarHandle AGE_INDIRECT_VAR_HANDLE;
        static {
            try {
                AGE_DIRECT_VAR_HANDLE = MethodHandles.privateLookupIn(Person.class, MethodHandles.lookup())
                    .findVarHandle(Person.class, "age", int.class);
                AGE_INDIRECT_VAR_HANDLE = new IntVarHandle(AGE_DIRECT_VAR_HANDLE);
            } catch (Throwable e) {
                throw new RuntimeException(e);
            }
        }
        Person[] data;
        @Setup
        public void setup() {
            var random = ThreadLocalRandom.current();
            data = Stream.generate(Person::new)
                .limit(4096)
                .peek(p -> p.age = random.nextInt())
                .toArray(Person[]::new);
        }
    }
    @Benchmark
    public long benchmark_direct_var_handle(BenchState state) {
        long acc = 0;
        for (int i = 0; i < state.data.length; i++) {
            acc += (int) BenchState.AGE_DIRECT_VAR_HANDLE.get(state.data[i]);
        }
        return acc;
    }
    @Benchmark
    public long benchmark_indirect_var_handle(BenchState state) {
        long acc = 0;
        for (int i = 0; i < state.data.length; i++) {
            acc += BenchState.AGE_INDIRECT_VAR_HANDLE.getInt(state.data[i]);
        }
        return acc;
    }
    public final static class Person {
        public int age;
    }
    public abstract static class TypedVarHandle {
    }
    public static class IntVarHandle extends TypedVarHandle {
        private final VarHandle varHandle;
        public IntVarHandle(VarHandle varHandle) {
            this.varHandle = varHandle;
        }
        public int getInt(Person o) {
            return (int) varHandle.get(o);
        }
    }
}
```

The results:
```
Bench.benchmark_direct_var_handle    thrpt    2  653482.980          ops/s
Bench.benchmark_indirect_var_handle  thrpt    2   36149.181          ops/s
```

From what I saw so far the reason is that the FieldInstanceReadOnly::get is not inlined in the indirect version.

Here are inlining logs for both cases
direct:
```
[6,254s][debug][jit,compilation]  121 %           pl.ks.jmh_generated.Bench_benchmark_direct_var_handle_jmhTest::benchmark_direct_var_handle_thrpt_jmhStub @ 13 (59 bytes)
[6,254s][debug][jit,inlining   ]                             @ 19   pl.ks.Bench::benchmark_direct_var_handle (40 bytes)   force inline by CompileCommand
[6,254s][debug][jit,inlining   ]                               @ 26   java.lang.invoke.VarHandleGuards::guard_L_I (59 bytes)   force inline by annotation
[6,254s][debug][jit,inlining   ]                                 @ 29   java.lang.invoke.VarForm::getMemberName (21 bytes)   force inline by annotation
[6,254s][debug][jit,inlining   ]                                 @ 32   java.lang.invoke.VarHandleInts$FieldInstanceReadOnly::get (22 bytes)   force inline by annotation
[6,254s][debug][jit,inlining   ]                                   @ 8   java.lang.Class::cast (27 bytes)   (intrinsic)
[6,254s][debug][jit,inlining   ]                                   @ 11   java.util.Objects::requireNonNull (14 bytes)   inline (hot)
[6,254s][debug][jit,inlining   ]                                   @ 18   jdk.internal.misc.Unsafe::getInt (0 bytes)   (intrinsic)
[6,254s][debug][jit,inlining   ]                             @ 22   org.openjdk.jmh.infra.Blackhole::consume (19 bytes)   force inline by CompileCommand
[6,254s][debug][jit,inlining   ]                               @ 15   org.openjdk.jmh.infra.Blackhole::consumeFull (31 bytes)   disallowed by CompileCommand
[6,255s][debug][jit,inlining   ]                             @ 39   java.lang.System::nanoTime (0 bytes)   (intrinsic)
```

indirect:
```
[6,278s][debug][jit,compilation]  122 %           pl.ks.jmh_generated.Bench_benchmark_indirect_var_handle_jmhTest::benchmark_indirect_var_handle_thrpt_jmhStub @ 13 (59 bytes)
[6,278s][debug][jit,inlining   ]                             @ 19   pl.ks.Bench::benchmark_indirect_var_handle (40 bytes)   force inline by CompileCommand
[6,279s][debug][jit,inlining   ]                               @ 26   pl.ks.Bench$IntVarHandle::getInt (9 bytes)   inline (hot)
[6,279s][debug][jit,inlining   ]                                 @ 5   java.lang.invoke.VarHandleGuards::guard_L_I (59 bytes)   force inline by annotation
[6,279s][debug][jit,inlining   ]                                   @ 29   java.lang.invoke.VarForm::getMemberName (21 bytes)   force inline by annotation
[6,279s][debug][jit,inlining   ]                             @ 22   org.openjdk.jmh.infra.Blackhole::consume (19 bytes)   force inline by CompileCommand
[6,279s][debug][jit,inlining   ]                               @ 15   org.openjdk.jmh.infra.Blackhole::consumeFull (31 bytes)   disallowed by CompileCommand
[6,279s][debug][jit,inlining   ]                             @ 39   java.lang.System::nanoTime (0 bytes)   (intrinsic)
```

The perfasm profiler also suggest that there is not a single compilation in indirect case. Hottest methods for direct:
```
....[Hottest Methods (after inlining)]..............................................................
  98,42%         c2, level 4  pl.ks.jmh_generated.Bench_benchmark_direct_var_handle_jmhTest::benchmark_direct_var_handle_thrpt_jmhStub, version 4, compile id 551 
```

For indirect:
```
....[Hottest Regions]...............................................................................
  44,31%         c2, level 4  pl.ks.jmh_generated.Bench_benchmark_indirect_var_handle_jmhTest::benchmark_indirect_var_handle_thrpt_jmhStub, version 4, compile id 532 
  41,97%         c2, level 4  java.lang.invoke.VarHandleInts$FieldInstanceReadOnly::get, version 2, compile id 499 
```

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run JMH benchamark

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The indirect version shouldn't be so slow.

FREQUENCY : always



Comments
This is a known constraint on the optimization of non-static finals. Close as a duplicate of JDK-8233873.
31-10-2023

See also this: https://shipilev.net/jvm/anatomy-quarks/17-trust-nonstatic-final-fields/
31-10-2023

This is more or less expected. JIT compilers can optimize var handles away, but that requires the var handle instance to be a constant. In the direct case that is true, but in the indirect case it is not. The underlying issue is that the JIT does not trust the stability of arbitrary final instance fields. So, while the `AGE_INDIRECT_VAR_HANDLE` field is seen as a constant, its `varHandle` field is not. A possible workaround is to use a record to wrap the var handle, since record fields are trusted as constant. See also: https://bugs.openjdk.org/browse/JDK-8233873
26-10-2023

Issue is reproduced. There is a huge difference in throughput for direct and indirect var_handles. OS: Windows 11 JDK 17: Fail Output: Benchmark Mode Cnt Score Error Units Bench.benchmark_direct_var_handle thrpt 2 546599.792 ops/s Bench.benchmark_indirect_var_handle thrpt 2 30740.876 ops/s JDK 21.0.1: Fail Output: Benchmark Mode Cnt Score Error Units Bench.benchmark_direct_var_handle thrpt 2 547345.990 ops/s Bench.benchmark_indirect_var_handle thrpt 2 30413.330 ops/s Moving it to dev team for further analysis.
26-10-2023