JDK-8237077 : C2 fails to optimize certain code shapes with memory access indexed var handles
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 15
  • Priority: P3
  • Status: Closed
  • Resolution: Not an Issue
  • Submitted: 2020-01-14
  • Updated: 2021-12-09
  • Resolved: 2021-12-09
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Related Reports
Relates :  
Note: to reproduce this issue, it is best to use the code in the Panama repository, the relevant code is contained in the "foreign-memaccess" branch. Consider the following benchmark:

static final int ELEM_SIZE = 1_000_000;
static final int CARRIER_SIZE = (int)JAVA_INT.byteSize();

static final VarHandle VH_int = MemoryLayout.ofSequence(JAVA_INT).varHandle(int.class, sequenceElement());

    public void segment_loop() {
        try (MemorySegment segment = MemorySegment.allocateNative(ALLOC_SIZE)) {            
            for (int i = 0; i < ELEM_SIZE; i++) {
                MemoryAddress address = segment.baseAddress();
                if (i % 2 == 0) {
                    VH_int.set(address, (long)i, i + 1);
                } else {
                    VH_int.set(address, (long)i, i - 1);

This gives good performances, and profiler traces shows that the loop is unrolled as expected. But if we change the benchmark to this:

    public void segment_loop() {
        try (MemorySegment segment = MemorySegment.allocateNative(ALLOC_SIZE)) {
            for (int i = 0; i < ELEM_SIZE; i++) {
                if (i % 2 == 0) {
                    VH_int.set(segment.baseAddress(), (long)i, i + 1);
                } else {
                    VH_int.set(segment.baseAddress(), (long)i, i - 1);
The loop is not unrolled, and none of the memory access API checks are hoisted outside of the loop, which yields much slower performances. I suspect some failure in escape analysis, or scalarization.
This is no longer reproducible since Java 16 (the dereference API no longer use MemoryAddress).

Allocation rate is the same in all cases - which seems to suggest EA is not the issue?