JDK-8257837 : Performance regression in heap byte buffer views
  • Type: Bug
  • Component: core-libs
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2020-12-07
  • Updated: 2020-12-15
  • Resolved: 2020-12-10
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 16
16 b00Fixed
Related Reports
Relates :  
Relates :  
Description
This benchmark:

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.TearDown;
import org.openjdk.jmh.annotations.Warmup;
import sun.misc.Unsafe;

import java.lang.invoke.VarHandle;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.nio.FloatBuffer;
import java.util.concurrent.TimeUnit;

import static jdk.incubator.foreign.MemoryLayout.PathElement.sequenceElement;
import static jdk.incubator.foreign.MemoryLayouts.JAVA_FLOAT;
import static jdk.incubator.foreign.MemoryLayouts.JAVA_INT;

@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 5, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@State(org.openjdk.jmh.annotations.Scope.Thread)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(value = 3, jvmArgsAppend = { "--add-modules=jdk.incubator.foreign" })
public class LoopOverPolluted {

    static final int ELEM_SIZE = 1_000_000;
    static final int CARRIER_SIZE = (int) JAVA_INT.byteSize();
    static final int ALLOC_SIZE = ELEM_SIZE * CARRIER_SIZE;

    static final Unsafe unsafe = Utils.unsafe;

    ByteBuffer bb = ByteBuffer.allocateDirect(ALLOC_SIZE).order(ByteOrder.nativeOrder());
    byte[] arr = new byte[ALLOC_SIZE];
    FloatBuffer fb = ByteBuffer.wrap(arr).order(ByteOrder.nativeOrder()).asFloatBuffer();

    @Setup
    public void setup() {
        for (int i = 0; i < ELEM_SIZE; i++) {
            bb.putFloat(i * 4, i);
        }
        for (int i = 0; i < ELEM_SIZE; i++) {
            fb.put(i, i);
        }
    }

    @TearDown
    public void tearDown() {
        unsafe.invokeCleaner(bb);
        arr = null;
        fb = null;
    }

    @Benchmark
    public int byte_buffer_get_float() {
        int sum = 0;
        for (int k = 0; k < ELEM_SIZE; k++) {
            bb.putFloat(k, (float)k + 1);
            float v = bb.getFloat(k * 4);
            sum += (int)v;
        }
        return sum;
    }

    @Benchmark
    public int float_buffer_get() {
        int sum = 0;
        for (int k = 0; k < ELEM_SIZE; k ++) {
            fb.put(k, k + 1);
            float v = fb.get(k);
            sum += (int)v;
        }
        return sum;
    }

    @Benchmark
    public int unsafe_get_float() {
        int sum = 0;
        for (int k = 0; k < ALLOC_SIZE; k += 4) {
            unsafe.putFloat(arr, k + Unsafe.ARRAY_BYTE_BASE_OFFSET, k + 1);
            float v = unsafe.getFloat(arr, k + Unsafe.ARRAY_BYTE_BASE_OFFSET);
            sum += (int)v;
        }
        return sum;
    }
}



Reveals a performance regression between Java 15 and Java 16. Here are the results on Java 15:

Benchmark                               Mode  Cnt  Score   Error  Units
LoopOverPolluted.byte_buffer_get_float  avgt   30  0.802 ? 0.011  ms/op
LoopOverPolluted.float_buffer_get       avgt   30  0.789 ? 0.009  ms/op
LoopOverPolluted.unsafe_get_float       avgt   30  0.494 ? 0.006  ms/op


On Java 16 we get this:

Benchmark                               Mode  Cnt  Score   Error  Units
LoopOverPolluted.byte_buffer_get_float  avgt   30  0.590 ? 0.012  ms/op
LoopOverPolluted.float_buffer_get       avgt   30  2.432 ? 0.060  ms/op
LoopOverPolluted.unsafe_get_float       avgt   30  0.504 ? 0.008  ms/op


This is likely caused by profile pollution in ScopedMemoryAccess - which is now used by the ByteBuffer API to access memory (at least in the heap views).

Comments
Changeset: 37043b05 Author: Maurizio Cimadamore <mcimadamore@openjdk.org> Date: 2020-12-10 15:32:36 +0000 URL: https://git.openjdk.java.net/jdk/commit/37043b05
10-12-2020

Assigning to myself as a placeholder for now.
07-12-2020