JDK-7078565 : prefer mfence on processors prior to Nehalem
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: hs22
  • Priority: P4
  • Status: Closed
  • Resolution: Won't Fix
  • OS: solaris_10
  • CPU: x86
  • Submitted: 2011-08-12
  • Updated: 2013-08-12
  • Resolved: 2013-08-12
Related Reports
Relates :  
Description
I believe this was caused by the switch to using lock addl[esp], 0 instead of mfence for volatile membars, 6822204. My review request for that said that at the time I didn't measure any performance change for Intel, http://cr.openjdk.java.net/~never/6822204.  On your microbenchmark I can measure the difference though so I'm going to remeasure derby which previously showed the big difference.  We may want to make the lock addl be AMD specific.

tom

On Aug 11, 2011, at 11:05 AM, Clemens Eisserer wrote:

Hi Vitaly,

I tried this bench on 6u23 and if I first run that code in a 10k iteration loop and then time the 1mm iteration loop I get about 10 ms speedup.  The first loop would trigger jit compilation (10k is the default threshold I believe) and second should run without compilation interruption.

Can you try the same? Also might be interesting to time it under the interpreter (-Xint).

I changed the testcase a bit, to no longer rely on OSR - as lockBench() will for sure soon hit the compilation threshold after a few runs.

I get the following timings for 1m runs:

jdk7-server: 53ms
jdk7-client: 62ms
jdk7-xint  : 955ms

jdk6-xint  : 1000ms
jdk6-client: 68ms
jdk6-server: 52ms

jdk5-server: 40ms
jdk5-client: 61ms
jdk5-xint  : 832ms

So JDK7 is slower in every case, the regression seems to have landed in jdk6 (I was using openjdk6).

Should I file a bug-report about this behaviour?

Thanks, Clemens


public class LockPerf {
   static ReentrantLock lock = new ReentrantLock();

   public static void main(String[] args) {
    while (true) {
         long start2 = System.nanoTime();
         for(int i=0; i < 1000; i++) {
         lockBench();
       }
       System.out.println("Lock bench: " + ((System.nanoTime() - start2)) / 1000000);
   }
   }

   private static void lockBench() {
       for (int i = 0; i < 1000; i++) {
         lock.lock();
         lock.unlock();
       }
   }
}


On Aug 11, 2011 11:38 AM, "Clemens Eisserer" <###@###.###> wrote:
Hi Vitaly,

Which OS are you using?

Linux-3.0 (Fedora 15)


Also, you should use System.nanoTime() for this type of timing as it gives
you a more precise timer.

I tried, but results remained the same. ~53ms for jdk6/7, ~41 for JDK5.
I was using the server compiler both times.

Thanks, Clemens