JDK-8016539 : Optimization / long/int/short/double/float serialization
  • Type: Enhancement
  • Component: performance
  • Affected Version: 7
  • Priority: P3
  • Status: Resolved
  • Resolution: Duplicate
  • OS: os_x
  • Submitted: 2013-04-18
  • Updated: 2013-10-08
  • Resolved: 2013-10-08
Related Reports
Duplicate :  
Description
A DESCRIPTION OF THE REQUEST :
Recently I benchmarked some different implementation to serialize and deserialize long, int, short, double and float primitives to/from byte[ arrays.

The first (just plain, portable Java code) one simply took the primitive, performed some bit shifting and tansfered the primitive value as multiple byte values to/from the byte array.
The second one uses sun.misc.Unsafe functionality.
The third one used (direct) ByteBuffers.

The test begins with a warmup phase to let the JVM compile byte code to native code (JIT) with JVM option  " -XX:CompileThreshold=10 " .
Next, each implementation is called 10000000 times for each primitive type and for get/set operations.

The performance results are really interesting (although we are talking about fractions of seconds):

Serializing using portable Java code is faster than sun.misc.Unsafe#getXYZ (when JIT compiled).
Deserializing using Usafe is faster than portable Java code (even when JIT compiled).

Serializing using ByteBuffer.get/putXYZ is 3 to 15 times(!) slower than Java/Unsafe alone.

The benchmark results were taken using using Java8 - but they are (nearly) the same with Java6 and Java7.

JUSTIFICATION :
Please investigate that issue.
I do not understand why an atomic Unsafe.putXYZ is slower than bit shifting and byte[] by index access.
Maybe it points out a point for JVM performace optimization.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
I would expect Unsafe.putXYZ() to be faster than  " portable Java code " .
An I would expect ByteBuffer.get/putXYZ not to be that slow.

---------- BEGIN SOURCE ----------
import org.junit.FixMethodOrder;
import org.junit.Test;
import org.junit.runners.MethodSorters;
import sun.misc.Unsafe;

import java.nio.ByteBuffer;

/**
 * @author Robert Stupp (last modified by $Author$)
 * @version $Revision$ $Date$
 */
@FixMethodOrder(MethodSorters.NAME_ASCENDING)
@SuppressWarnings( " UseOfSystemOutOrSystemErr " )
public class UnsafesPerformanceDeveloperTest {

    public static final int LOOPS = 10000000;

    static String nanosToTime(long nanos) {
        StringBuilder sb = new StringBuilder(10);

        long ns = nanos % THOUSAND;
        nanos /= THOUSAND;
        long mi = nanos % THOUSAND;
        nanos /= THOUSAND;
        long ms = nanos % THOUSAND;
        nanos /= THOUSAND;
        long seconds = nanos % SECONDS_PER_MINUTE;
        nanos /= SECONDS_PER_MINUTE;
        long minutes = nanos % MINUTES_PER_HOUR;
        nanos /= MINUTES_PER_HOUR;
        long hours = nanos % HOURS_PER_DAY;
        nanos /= HOURS_PER_DAY;
        long days = nanos;

        boolean print = days > 0;
        if (print)
            sb.append(days).append('d');

        boolean hadPrint = print;
        print |= hours > 0;
        if (print) {
            if (hadPrint && hours < 10)
                sb.append('0');
            sb.append(hours).append('h');
        }

        hadPrint = print;
        print |= minutes > 0;
        if (print) {
            if (hadPrint && minutes < 10)
                sb.append('0');
            sb.append(minutes).append('m');
        }

        hadPrint = print;
        print |= seconds > 0;
        if (print) {
            if (hadPrint && seconds < 10)
                sb.append('0');
            sb.append(seconds).append('s');
        }

        hadPrint = print;
        print |= ms > 0 || (mi == 0 && ns == 0);
        if (print) {
            if (hadPrint && ms < 100)
                sb.append('0');
            if (hadPrint && ms < 10)
                sb.append('0');
            sb.append(ms).append( " ms " );
        }

        if (mi > 0 || ns > 0) {
            hadPrint = print;
            print |= mi > 0;
            if (print) {
                if (hadPrint && mi < 100)
                    sb.append('0');
                if (hadPrint && mi < 10)
                    sb.append('0');
                sb.append(mi).append( " us " );
            }
        }

        if (ns > 0) {
            hadPrint = print;
            //        print |= ns > 0;
            //        if (print) {
            if (hadPrint && ns < 100)
                sb.append('0');
            if (hadPrint && ns < 10)
                sb.append('0');
            sb.append(ns).append( " ns " );
            //        }
        }

        return sb.toString();
    }

    @Test
    public void aWarmup() throws InterruptedException {
        byte[] arr = new byte[24];
        ByteBuffer bb = ByteBuffer.wrap(arr);

        System.out.println( " Warmup... " );
        for (int warmup = 0; warmup < 20; warmup++) {
            for (int n = 0; n < 10000000; n++) {
                serialize(0x1234567812345678L, arr, 0);
                Unsafes.fastSet(0x1234567812345678L, arr, 0);
                //                Unsafes.Really.fastSet(0x1234567812345678L, arr, 0);
                bb.clear();
                bb.putLong(0x1234567812345678L);
                deserializeLong(arr, 0);
                Unsafes.fastGetLong(arr, 0);
                //                Unsafes.Really.fastGetLong(arr, 0);
                bb.flip();
                bb.getLong();

                serialize(0x12345678, arr, 0);
                Unsafes.fastSet(0x12345678, arr, 0);
                //                Unsafes.Really.fastSet(0x12345678, arr, 0);
                bb.clear();
                bb.putInt(0x12345678);
                deserializeInt(arr, 0);
                Unsafes.fastGetInt(arr, 0);
                //                Unsafes.Really.fastGetLong(arr, 0);
                bb.flip();
                bb.getInt();

                serialize((short) 0x1234, arr, 0);
                Unsafes.fastSet((short) 0x1234, arr, 0);
                //                Unsafes.Really.fastSet((short)0x1234, arr, 0);
                bb.clear();
                bb.putShort((short) 0x1234);
                deserializeShort(arr, 0);
                Unsafes.fastGetShort(arr, 0);
                //                Unsafes.Really.fastGetShort(arr, 0);
                bb.flip();
                bb.getShort();

                serialize(11.21378923f, arr, 0);
                Unsafes.fastSet(11.21378923f, arr, 0);
                //                Unsafes.Really.fastSet(11.21378923f, arr, 0);
                bb.clear();
                bb.putFloat(11.21378923f);
                deserializeFloat(arr, 0);
                Unsafes.fastGetFloat(arr, 0);
                //                Unsafes.Really.fastGetFloat(arr, 0);
                bb.flip();
                bb.getFloat();

                serialize(11.21378923d, arr, 0);
                Unsafes.fastSet(11.21378923d, arr, 0);
                //                Unsafes.Really.fastSet(11.21378923d, arr, 0);
                bb.clear();
                bb.putDouble(11.21378923d);
                deserializeDouble(arr, 0);
                Unsafes.fastGetDouble(arr, 0);
                //                Unsafes.Really.fastGetDouble(arr, 0);
                bb.flip();
                bb.getDouble();
            }
            Thread.sleep(100);
        }

    }

    @Test
    public void performanceLong() throws Exception {
        byte[] arr = new byte[24];
        ByteBuffer bb = ByteBuffer.wrap(arr);

        for (int off = 0; off < 16; off++) {

            long t0 = System.nanoTime();
            for (int n = 0; n < LOOPS; n++)
                serialize(0x1234567812345678L, arr, off);
            long tSetJava = System.nanoTime() - t0;

            t0 = System.nanoTime();
            for (int n = 0; n < LOOPS; n++)
                Unsafes.fastSet(0x1234567812345678L, arr, off);
            long tSetUnsafes = System.nanoTime() - t0;

            //            t0 = System.nanoTime();
            //            for (int n = 0; n < LOOPS; n++)
            //                Unsafes.Really.fastSet(0x1234567812345678L, arr, off);
            //            long tSetUnsafesReally = System.nanoTime() - t0;

            t0 = System.nanoTime();
            for (int n = 0; n < LOOPS; n++) {
                bb.limit(off + 8);
                bb.putLong(off, 0x1234567812345678L);
            }
            long tSetBb = System.nanoTime() - t0;

            t0 = System.nanoTime();
            for (int n = 0; n < LOOPS; n++)
                deserializeLong(arr, off);
            long tGetJava = System.nanoTime() - t0;

            t0 = System.nanoTime();
            for (int n = 0; n < LOOPS; n++)
                Unsafes.fastGetLong(arr, off);
            long tGetUnsafes = System.nanoTime() - t0;

            //            t0 = System.nanoTime();
            //            for (int n = 0; n < LOOPS; n++)
            //                Unsafes.Really.fastGetLong(arr, off);
            //            long tGetUnsafesReally = System.nanoTime() - t0;

            t0 = System.nanoTime();
            for (int n = 0; n < LOOPS; n++) {
                bb.position(off);
                bb.getLong();
            }
            long tGetBb = System.nanoTime() - t0;

            System.out.println( " long , offset= "  + off +  "  : " );
            System.out.println( "    set/Java:            "  + nanosToTime(tSetJava));
            System.out.println( "    set/Unsafes:         "  + nanosToTime(tSetUnsafes));
            //            System.out.println( "    set/Unsafes.Really:  "  + nanosToTime(tSetUnsafesReally));
            System.out.println( "    set/ByteBuffer:      "  + nanosToTime(tSetBb));
            System.out.println( "    get/Java:            "  + nanosToTime(tGetJava));
            System.out.println( "    get/Unsafes:         "  + nanosToTime(tGetUnsafes));
            //            System.out.println( "    get/Unsafes.Really:  "  + nanosToTime(tGetUnsafesReally));
            System.out.println( "    get/ByteBuffer:      "  + nanosToTime(tGetBb));
        }

    }

    @Test
    public void performanceInt() throws Exception {
        byte[] arr = new byte[24];
        ByteBuffer bb = ByteBuffer.wrap(arr);

        for (int off = 0; off < 16; off++) {

            long t0 = System.nanoTime();
            for (int n = 0; n < LOOPS; n++)
                serialize(0x12345678, arr, off);
            long tSetJava = System.nanoTime() - t0;

            t0 = System.nanoTime();
            for (int n = 0; n < LOOPS; n++)
                Unsafes.fastSet(0x12345678, arr, off);
            long tSetUnsafes = System.nanoTime() - t0;

            //            t0 = System.nanoTime();
            //            for (int n = 0; n < LOOPS; n++)
            //                Unsafes.Really.fastSet(0x12345678, arr, off);
            //            long tSetUnsafesReally = System.nanoTime() - t0;

            t0 = System.nanoTime();
            for (int n = 0; n < LOOPS; n++) {
                bb.limit(off + 8);
                bb.putInt(off, 0x12345678);
            }
            long tSetBb = System.nanoTime() - t0;

            t0 = System.nanoTime();
            for (int n = 0; n < LOOPS; n++)
                deserializeInt(arr, off);
            long tGetJava = System.nanoTime() - t0;

            t0 = System.nanoTime();
            for (int n = 0; n < LOOPS; n++)
                Unsafes.fastGetInt(arr, off);
            long tGetUnsafes = System.nanoTime() - t0;

            //            t0 = System.nanoTime();
            //            for (int n = 0; n < LOOPS; n++)
            //                Unsafes.Really.fastGetInt(arr, off);
            //            long tGetUnsafesReally = System.nanoTime() - t0;

            t0 = System.nanoTime();
            for (int n = 0; n < LOOPS; n++) {
                bb.position(off);
                bb.getInt();
            }
            long tGetBb = System.nanoTime() - t0;

            System.out.println( " int , offset= "  + off +  "  : " );
            System.out.println( "    set/Java:            "  + nanosToTime(tSetJava));
            System.out.println( "    set/Unsafes:         "  + nanosToTime(tSetUnsafes));
            //            System.out.println( "    set/Unsafes.Really:  "  + nanosToTime(tSetUnsafesReally));
            System.out.println( "    set/ByteBuffer:      "  + nanosToTime(tSetBb));
            System.out.println( "    get/Java:            "  + nanosToTime(tGetJava));
            System.out.println( "    get/Unsafes:         "  + nanosToTime(tGetUnsafes));
            //            System.out.println( "    get/Unsafes.Really:  "  + nanosToTime(tGetUnsafesReally));
            System.out.println( "    get/ByteBuffer:      "  + nanosToTime(tGetBb));
        }

    }

    @Test
    public void performanceShort() throws Exception {
        byte[] arr = new byte[24];
        ByteBuffer bb = ByteBuffer.wrap(arr);

        for (int off = 0; off < 16; off++) {

            long t0 = System.nanoTime();
            for (int n = 0; n < LOOPS; n++)
                serialize((short) 0x1234, arr, off);
            long tSetJava = System.nanoTime() - t0;

            t0 = System.nanoTime();
            for (int n = 0; n < LOOPS; n++)
                Unsafes.fastSet((short) 0x1234, arr, off);
            long tSetUnsafes = System.nanoTime() - t0;

            //            t0 = System.nanoTime();
            //            for (int n = 0; n < LOOPS; n++)
            //                Unsafes.Really.fastSet((short)0x1234, arr, off);
            //            long tSetUnsafesReally = System.nanoTime() - t0;

            t0 = System.nanoTime();
            for (int n = 0; n < LOOPS; n++) {
                bb.limit(off + 8);
                bb.putShort(off, (short)0x1234);
            }
            long tSetBb = System.nanoTime() - t0;

            t0 = System.nanoTime();
            for (int n = 0; n < LOOPS; n++)
                deserializeShort(arr, off);
            long tGetJava = System.nanoTime() - t0;

            t0 = System.nanoTime();
            for (int n = 0; n < LOOPS; n++)
                Unsafes.fastGetShort(arr, off);
            long tGetUnsafes = System.nanoTime() - t0;

            //            t0 = System.nanoTime();
            //            for (int n = 0; n < LOOPS; n++)
            //                Unsafes.Really.fastGetShort(arr, off);
            //            long tGetUnsafesReally = System.nanoTime() - t0;

            t0 = System.nanoTime();
            for (int n = 0; n < LOOPS; n++) {
                bb.position(off);
                bb.getShort();
            }
            long tGetBb = System.nanoTime() - t0;

            System.out.println( " short , offset= "  + off +  "  : " );
            System.out.println( "    set/Java:            "  + nanosToTime(tSetJava));
            System.out.println( "    set/Unsafes:         "  + nanosToTime(tSetUnsafes));
            //            System.out.println( "    set/Unsafes.Really:  "  + nanosToTime(tSetUnsafesReally));
            System.out.println( "    set/ByteBuffer:      "  + nanosToTime(tSetBb));
            System.out.println( "    get/Java:            "  + nanosToTime(tGetJava));
            System.out.println( "    get/Unsafes:         "  + nanosToTime(tGetUnsafes));
            //            System.out.println( "    get/Unsafes.Really:  "  + nanosToTime(tGetUnsafesReally));
            System.out.println( "    get/ByteBuffer:      "  + nanosToTime(tGetBb));
        }

    }

    @Test
    public void performanceDouble() throws Exception {
        byte[] arr = new byte[24];
        ByteBuffer bb = ByteBuffer.wrap(arr);

        for (int off = 0; off < 16; off++) {

            long t0 = System.nanoTime();
            for (int n = 0; n < LOOPS; n++)
                serialize(43.584d, arr, off);
            long tSetJava = System.nanoTime() - t0;

            t0 = System.nanoTime();
            for (int n = 0; n < LOOPS; n++)
                Unsafes.fastSet(43.584d, arr, off);
            long tSetUnsafes = System.nanoTime() - t0;

            //            t0 = System.nanoTime();
            //            for (int n = 0; n < LOOPS; n++)
            //                Unsafes.Really.fastSet(43.584d, arr, off);
            //            long tSetUnsafesReally = System.nanoTime() - t0;

            t0 = System.nanoTime();
            for (int n = 0; n 


( This report has more than 16,000 characters and has been truncated. )
Comments
I have moved the only validated claim to JDK-8026049. I am closing this issue as the duplicate.
08-10-2013

I had also contacted the original submitter, and described the benchmark pitfalls in the submit. The submitter is invited to follow up on the issue on core-libs-dev@openjdk. Leaving open for now, until I hear from original submitter back.
19-06-2013

Ok, there are several claims in the original report. Most of them seem to be the misinterpretations of the faulty benchmark. The non-extensive list of benchmark problems: 1) warmup in the distinct method does not help: JVM will inline everything in aWarmup(), and the warmup effects will not make over to measurement methods 2) getters are not storing the value anywhere, setting themselves up for dead-code elimination 3) multiple code paths are measured in the single JVM, and the single method; profile mixup and depleting the inline budget for the latter cases can significantly skew the measurement. After the validation with JMH for the "long" case: A) "Serializing using portable Java code is faster than sun.misc.Unsafe#getXYZ (when JIT compiled)". <--- NOT VALIDATED, hand-written serialization/deserialization code is actually slower; B) "Deserializing using Usafe is faster than portable Java code (even when JIT compiled)." <--- VALIDATED, and this is the understandable behavior, since Unsafe can get a single full read, instead of doing the per-byte reads. It is the open question if this is the missing optimization in JIT. C) "Serializing using ByteBuffer.get/putXYZ is 3 to 15 times(!) slower than Java/Unsafe alone." <--- NOT VALIDATED, the ByteBuffer code is only twice as slower as hand-written serialization/deserialization code. Again, faulty benchmarking: the original benchmark includes the auxiliary ByteBuffer operations (limit/position) in the measurement time. The JMH benchmark is here: http://cr.openjdk.java.net/~shade/8016539/8016539-bench.zip On 8b94, Linux x86_64, 2x2 i5, 2.0 GHz: Benchmark Mode Thr Cnt Sec Mean Mean error Units o.s.UnsafesPerformanceBench.long_get_bb avgt 1 5 1 11.535 0.064 nsec/op o.s.UnsafesPerformanceBench.long_get_deserialize avgt 1 5 1 6.032 0.083 nsec/op o.s.UnsafesPerformanceBench.long_get_fastGet avgt 1 5 1 1.913 0.066 nsec/op o.s.UnsafesPerformanceBench.long_get_uberfastGet avgt 1 5 1 1.514 0.007 nsec/op o.s.UnsafesPerformanceBench.long_put_bb avgt 1 5 1 8.382 0.098 nsec/op o.s.UnsafesPerformanceBench.long_put_fastSet avgt 1 5 1 5.567 0.018 nsec/op o.s.UnsafesPerformanceBench.long_put_serialize avgt 1 5 1 5.575 0.006 nsec/op o.s.UnsafesPerformanceBench.long_put_uberfastSet avgt 1 5 1 5.640 0.003 nsec/op Note the differences are not drastic.
19-06-2013

Longer term and not for JDK8. This appears to be a request from an external source. Aleksey, We need to validate that this is viable and appropriate from a performance perspective. If it is, you can reassign it to core-libs/java.io:serialization.
18-06-2013