Bug ID: JDK-8237077 C2 fails to optimize certain code shapes with memory access indexed var handles

Type: Enhancement
Component: hotspot
Sub-Component: compiler
Affected Version: 15

Priority: P3
Status: Closed
Resolution: Not an Issue

Submitted: 2020-01-14
Updated: 2021-12-09
Resolved: 2021-12-09

Other
tbdResolved

Note: to reproduce this issue, it is best to use the code in the Panama repository, the relevant code is contained in the "foreign-memaccess" branch. Consider the following benchmark:

static final int ELEM_SIZE = 1_000_000;
static final int CARRIER_SIZE = (int)JAVA_INT.byteSize();
static final int ALLOC_SIZE = ELEM_SIZE * CARRIER_SIZE;

static final VarHandle VH_int = MemoryLayout.ofSequence(JAVA_INT).varHandle(int.class, sequenceElement());

@Benchmark
    public void segment_loop() {
        try (MemorySegment segment = MemorySegment.allocateNative(ALLOC_SIZE)) {            
            for (int i = 0; i < ELEM_SIZE; i++) {
                MemoryAddress address = segment.baseAddress();
                if (i % 2 == 0) {
                    VH_int.set(address, (long)i, i + 1);
                } else {
                    VH_int.set(address, (long)i, i - 1);
                }
            }
        }
    }

This gives good performances, and profiler traces shows that the loop is unrolled as expected. But if we change the benchmark to this:

@Benchmark
    public void segment_loop() {
        try (MemorySegment segment = MemorySegment.allocateNative(ALLOC_SIZE)) {
            for (int i = 0; i < ELEM_SIZE; i++) {
                if (i % 2 == 0) {
                    VH_int.set(segment.baseAddress(), (long)i, i + 1);
                } else {
                    VH_int.set(segment.baseAddress(), (long)i, i - 1);
                }
            }
        }
    }
                
The loop is not unrolled, and none of the memory access API checks are hoisted outside of the loop, which yields much slower performances. I suspect some failure in escape analysis, or scalarization.

This is no longer reproducible since Java 16 (the dereference API no longer use MemoryAddress).

09-12-2021

Allocation rate is the same in all cases - which seems to suggest EA is not the issue?

14-01-2020