JDK-8237077 : C2 fails to optimize certain code shapes with memory access indexed var handles
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 15
  • Priority: P3
  • Status: Closed
  • Resolution: Not an Issue
  • Submitted: 2020-01-14
  • Updated: 2021-12-09
  • Resolved: 2021-12-09
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdResolved
Related Reports
Relates :  
Description
Note: to reproduce this issue, it is best to use the code in the Panama repository, the relevant code is contained in the "foreign-memaccess" branch. Consider the following benchmark:

static final int ELEM_SIZE = 1_000_000;
static final int CARRIER_SIZE = (int)JAVA_INT.byteSize();
static final int ALLOC_SIZE = ELEM_SIZE * CARRIER_SIZE;

static final VarHandle VH_int = MemoryLayout.ofSequence(JAVA_INT).varHandle(int.class, sequenceElement());

@Benchmark
    public void segment_loop() {
        try (MemorySegment segment = MemorySegment.allocateNative(ALLOC_SIZE)) {            
            for (int i = 0; i < ELEM_SIZE; i++) {
                MemoryAddress address = segment.baseAddress();
                if (i % 2 == 0) {
                    VH_int.set(address, (long)i, i + 1);
                } else {
                    VH_int.set(address, (long)i, i - 1);
                }
            }
        }
    }

This gives good performances, and profiler traces shows that the loop is unrolled as expected. But if we change the benchmark to this:

@Benchmark
    public void segment_loop() {
        try (MemorySegment segment = MemorySegment.allocateNative(ALLOC_SIZE)) {
            for (int i = 0; i < ELEM_SIZE; i++) {
                if (i % 2 == 0) {
                    VH_int.set(segment.baseAddress(), (long)i, i + 1);
                } else {
                    VH_int.set(segment.baseAddress(), (long)i, i - 1);
                }
            }
        }
    }
                
The loop is not unrolled, and none of the memory access API checks are hoisted outside of the loop, which yields much slower performances. I suspect some failure in escape analysis, or scalarization.
Comments
This is no longer reproducible since Java 16 (the dereference API no longer use MemoryAddress).
09-12-2021

Allocation rate is the same in all cases - which seems to suggest EA is not the issue?
14-01-2020