JDK-7147227 : Performance Regression in ByteBuffers
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 7
  • Priority: P3
  • Status: Closed
  • Resolution: Won't Fix
  • OS: linux
  • CPU: x86
  • Submitted: 2012-02-20
  • Updated: 2019-09-13
  • Resolved: 2014-04-30
Related Reports
Relates :  
Description
FULL PRODUCT VERSION :
java version "1.7.0"
Java(TM) SE Runtime Environment (build 1.7.0-b147)
Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)


ADDITIONAL OS VERSION INFORMATION :
2.6.32-37-generic #81-Ubuntu SMP Fri Dec 2 20:32:42 UTC 2011 x86_64 GNU/Linux

A DESCRIPTION OF THE PROBLEM :
I noticed a 20% reduction in performance of some benchmarks.  One shared property of the regressed benchmarks was that they both used ByteBuffers.

I then created some more targeted tests of ByteBuffers, simply putting and getting into them in a tight loop.  The test was essentially:

      byte[] data = 512*1024;
      final ByteBuffer buffer = ByteBuffer.wrap(data);

      while (buffer.position() < buffer.limit())
      {
         buffer.get();
      }

And for puts:

      final ByteBuffer buffer = ByteBuffer.wrap(data);

      while (buffer.position() < buffer.limit())
      {
         buffer.put((byte) 0);
      }

I then measured the average execution time for these to complete, in both Java 7, and a version of Java 6: java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)"

The difference in performance was striking:

Java 7 get() rate: 14.45 GiB / second
Java 6 get() rate: 21.78 GiB / second

Java 7 put() rate: 5.02 GiB / second
Java 6 put() rate: 9.69 GiB / second

REGRESSION.  Last worked in version 6u29

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Create a test as described in the description, execute it using Java 7, then execute it using Java 6.  Note the difference in runtime.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Performance on part or better in Java 7.
ACTUAL -
Performance was 2/3rds the speed for gets, and 1/2 for puts.

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------

package org.cleversafe.benchmarks.processing;

import java.nio.ByteBuffer;

public class BugReproduce
{

   public static void main(final String args[])
   {
      final byte[] data = new byte[512 * 1024];

      long start = System.nanoTime();

      for (int i = 0; i < 10000; i++)
      {
         final ByteBuffer buffer = ByteBuffer.wrap(data);
         while (buffer.position() < buffer.limit())
         {
            buffer.get();
         }
      }

      long end = System.nanoTime();

      System.out.println("Execution took: " + (end - start) + " nanoseconds");

      start = System.nanoTime();

      for (int i = 0; i < 10000; i++)
      {
         final ByteBuffer buffer = ByteBuffer.wrap(data);
         while (buffer.position() < buffer.limit())
         {
            buffer.put((byte) 0);
         }
      }

      end = System.nanoTime();

      System.out.println("Execution took: " + (end - start) + " nanoseconds");
   }
}

---------- END SOURCE ----------

Comments
$ /export/twisti/aot/build/linux-x86_64-normal-server-release/images/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler -Djvmci.compiler=graal -XX:-TieredCompilation -XX:CICompilerCount=1 -Xbatch BugReproduce Execution took: 6194005008 nanoseconds Execution took: 5443412362 nanoseconds
07-03-2016

$ /java/re/jdk/6u115/latest/binaries/linux-amd64/bin/java -XX:-TieredCompilation -XX:CICompilerCount=1 -Xbatch BugReproduce Execution took: 475278178 nanoseconds Execution took: 2771124903 nanoseconds $ /java/re/jdk/7u80/latest/binaries/linux-x64/bin/java -XX:-TieredCompilation -XX:CICompilerCount=1 -Xbatch BugReproduce Execution took: 18827525370 nanoseconds Execution took: 16933517562 nanoseconds $ /java/re/jdk/8u65/latest/binaries/linux-x64/bin/java -XX:-TieredCompilation -XX:CICompilerCount=1 -Xbatch BugReproduce Execution took: 19002119805 nanoseconds Execution took: 16679141508 nanoseconds $ /java/re/jdk/9/promoted/all/108/binaries/linux-x64/bin/java -XX:-TieredCompilation -XX:CICompilerCount=1 -Xbatch BugReproduce Execution took: 19921354012 nanoseconds Execution took: 20236486218 nanoseconds
07-03-2016

No, it's a Mac :-) I have to redo it on another box.
01-05-2014

Christian, can you run latest jdk6u on the same machine? The regression was between 6u and 7.
30-04-2014

Seems to be the same for 8 and 9: $ /Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/bin/java -XX:-TieredCompilation -XX:CICompilerCount=1 -Xbatch BugReproduce Execution took: 12541521000 nanoseconds Execution took: 10830044000 nanoseconds $ /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java -XX:-TieredCompilation -XX:CICompilerCount=1 -Xbatch BugReproduce Execution took: 12570789000 nanoseconds Execution took: 10754521000 nanoseconds $ /Library/Java/JavaVirtualMachines/jdk1.9.0.jdk/Contents/Home/bin/java -XX:-TieredCompilation -XX:CICompilerCount=1 -Xbatch BugReproduce Execution took: 12614811000 nanoseconds Execution took: 10779250000 nanoseconds
29-04-2014

I am not sure. Need to test. Here is what I said in related 7147987: It will be difficult to optimize it in libraries. I can take this bug back to C2. The class check is loop invariant so C2 should do loop splitting optimization and it doesn't do it for some reason (may be because other branch of check is uncommon trap). I will see what happened with latest Hotspot sources and will update this bug report.
29-04-2014

Does the regression still exist in 9?
29-04-2014

regression betewwn 6 and 7. closing as WNF, no performance fixes will be done for 7.
29-04-2014

EVALUATION I keep this bug for VM regression in HS21-b12. And I created new JDK bug 7147987 to track JDK regression (after jdk7-b69).
22-02-2012

EVALUATION There aren't any changes between 6u29 and 7 to explain this. This is more likely to be something in the HotSpot server compiler.
21-02-2012