JDK-5108893 : Math.abs() is slow
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 5.0
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: windows_2000
  • CPU: x86
  • Submitted: 2004-09-29
  • Updated: 2004-11-17
  • Resolved: 2004-11-17
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6
6 b13Fixed
Related Reports
Relates :  
Relates :  
Description
Name: gm110360			Date: 09/29/2004


A DESCRIPTION OF THE REQUEST :
Current JVMs implement Math.abs() using compares and branches even though there are far more efficient code sequences for computing the absolute value of  a floating point value that do not require a branches.  These non-branching code sequences are both much shorter and cheaper than the compare and branch code currently emitted by hotspot (in both the client and server JVMs).

For example, x87 has a fabs instruction which is slightly cheaper than an add and SSE uses a boolean & instructions (andsd for doubles andss for floats) which has the same cost as an add instruction.  However as the timing results below show, currently Math.abs() is much more expensive than an add.

JUSTIFICATION :
Math.abs() is currently much slower and more expensive than it should be.  It costs around six times more than an add when it should cost the same or even be slightly cheaper.  The cost of Math.abs() can be significant in some codes (including some of ours) that make heavy use of it.  Using the more efficient code sequences would signifcantly speed up such programs.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Expected behaviour is that Math.abs(double) and Math.abs(float) should cost about the same as an add.
ACTUAL -
Actual behaviour is that Math.abs() is much more expensive than an add as show in the timing results for the server and client 1.5.0rc JVMs below.

java version "1.5.0-rc"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-rc-b63)
Java HotSpot(TM) Server VM (build 1.5.0-rc-b63, mixed mode)

avg    2.3 ns   total 4.68E-1 s  for assign             (~ 4.0 cycles)
avg    2.5 ns   total 5.00E-1 s  for add                (~ 4.2 cycles)
avg   17.0 ns   total 3.41E0 s   for Math.abs()         (~ 29.0 cycles)
avg    2.4 ns   total 4.85E-1 s  for assign             (~ 4.1 cycles)
avg    2.5 ns   total 5.00E-1 s  for add                (~ 4.2 cycles)
avg   17.0 ns   total 3.39E0 s   for Math.abs()         (~ 28.8 cycles)

java version "1.5.0-rc"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-rc-b63)
Java HotSpot(TM) Client VM (build 1.5.0-rc-b63, mixed mode, sharing)

avg    5.0 ns   total 1.00E0 s   for assign             (~ 8.5 cycles)
avg    5.3 ns   total 1.06E0 s   for add                (~ 9.0 cycles)
avg   30.8 ns   total 6.16E0 s   for Math.abs()         (~ 52.3 cycles)
avg    5.0 ns   total 1.00E0 s   for assign             (~ 8.5 cycles)
avg    5.4 ns   total 1.08E0 s   for add                (~ 9.2 cycles)
avg   30.6 ns   total 6.11E0 s   for Math.abs()         (~ 51.9 cycles)


---------- BEGIN SOURCE ----------

import java.text.DecimalFormat;
import java.util.Random;

/** Test that show that Math.abs is much slower than a floating point add
 * even though it should have about the same cost
 *
 * @author  bjw  Bruce Walter, Cornell Program of Computer Graphics 2004
 */
public class AbsTest {
  //target total number of repetitions of the operation
  public static final int opTargetRepetitions = 200000000;
  //size of arrays that are operated on
  public static final int arraySize = 10000;
  //number of times we need to process each array to reach total target count
  public static final int reps = opTargetRepetitions/arraySize;
  //pretty print the output numbers to make them easier to read
  public static final DecimalFormat decForm = new DecimalFormat("###0.0");
  public static final DecimalFormat sciForm = new DecimalFormat("0.00E0");
  //my processor is a 1.7GHz Xeon (actually it is a dual processor, but this test is single threaded)
  public static final double ghzProcSpeed = 1.7; //my processor is 1.7GHz
    
  public static void runTimingTest(TestOp op, double result[], double src[], boolean print) {
    long time = System.currentTimeMillis();
    for(int i=0;i<reps;i++) {
      op.performOp(result,src);
    }
    time = System.currentTimeMillis() - time;
    double denom = 1000000.0/(reps*src.length);
    if (print) {
      String ps = decForm.format(time*denom);
      while (ps.length()<6) ps = " "+ps;
      ps = "avg "+ps+" ns   total "+sciForm.format(time/1000.0)+" s";
      while (ps.length()<32) ps += " ";
      ps = ps+" for "+op.toString();
      while (ps.length()<50) ps += " ";
      System.out.println(ps+"\t(~ "+decForm.format(time*denom*ghzProcSpeed)+" cycles)");
    }
  }
    
  public static void main(String[] args) throws InterruptedException {
    double src[] = new double[arraySize];
    double result[] = new double[arraySize];
    Random ran = new Random(5232482349538L);
    //set the src array to be random values between -1 and 1 (but excluding zero)
    for(int i=0;i<src.length;i++) {
      do {
        src[i] = 2*ran.nextDouble() - 1.0;
      } while (src[i] == 0);
    }
    TestOp tests[] = { new AssignOp(), new AddOp(), new AbsOp()};
    //warm up hotspot
    for(int i=0;i<tests.length;i++) {
      runTimingTest(tests[i],result,src,false);
    }
    //now run the real tests and print the timings
    for(int i=0;i<tests.length;i++) {
      runTimingTest(tests[i],result,src,true);
    }
    //do it again to show the timings are reasonably consistent
    for(int i=0;i<tests.length;i++) {
      runTimingTest(tests[i],result,src,true);
    }
  }
  
  public abstract static class TestOp {
    public abstract void performOp(double result[], double src[]);
  }
  
  public static class AssignOp extends TestOp {
    public String toString() { return "assign"; }
    public void performOp(double result[], double src[]) {
      for(int i=0;i<src.length;i++) {
        result[i] = src[i];
      }
    }
  }
  
  public static class AddOp extends TestOp {
    public String toString() { return "add"; }
    public void performOp(double result[], double src[]) {
      for(int i=0;i<src.length;i++) {
        result[i] = 0.143+src[i];
      }
    }
  }
  
  public static class AbsOp extends TestOp {
    public String toString() { return "Math.abs()"; }
    public void performOp(double result[], double src[]) {
      for(int i=0;i<src.length;i++) {
        result[i] = Math.abs(src[i]);
      }
    }
  }
 
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
None.  Attempts to force the more efficient code sequence such as:

private static double myAbs(double a) {
   return Double.longBitsToDouble(Double.doubleToRawLongBits(a)&0x7fffffffffffffffL);
}

are not any faster due to the overheads of  calling these methods on Double.
(Incident Review ID: 315868) 
======================================================================
###@###.### 10/14/04 23:50 GMT

Comments
SUGGESTED FIX http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/c2_baseline/2004/20041027101346.azeem.abs/workspace/webrevs/webrev-2004.10.27/index.html ###@###.### 10/28/04 16:22 GMT
28-10-2004

CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: mustang
30-09-2004

EVALUATION Fix this in Mustang ###@###.### 2004-09-29 I'm seeing performance of ABS that is roughly 2.5x slower, not 6x as the submitter claims: @prt-solx86-q1-2$ $jp/bin/java -server AbsTest avg 2.7 ns total 5.31E-1 s for assign (~ 4.0 cycles) avg 2.9 ns total 5.74E-1 s for add (~ 4.3 cycles) avg 7.6 ns total 1.53E0 s for Math.abs() (~ 11.4 cycles) avg 2.7 ns total 5.31E-1 s for assign (~ 4.0 cycles) avg 2.8 ns total 5.50E-1 s for add (~ 4.1 cycles) avg 7.6 ns total 1.53E0 s for Math.abs() (~ 11.5 cycles) My machine is a dual 1.5Ghz, but even on a P3-700Mhz performance is still roughly 2.5x slower @foundation$ $jp/bin/java -server AbsTest avg 5.8 ns total 1.17E0 s for assign (~ 4.1 cycles) avg 6.4 ns total 1.29E0 s for add (~ 4.5 cycles) avg 16.4 ns total 3.28E0 s for Math.abs() (~ 11.5 cycles) avg 5.8 ns total 1.17E0 s for assign (~ 4.1 cycles) avg 6.4 ns total 1.29E0 s for add (~ 4.5 cycles) avg 16.4 ns total 3.27E0 s for Math.abs() (~ 11.5 cycles) Looking into this... ###@###.### 10/14/04 23:50 GMT It seems the SSE2 version of AbsD is SLOW. If I enable the flags -XX:UseSSE=0 or -XX:UseSSE=1 then ABS is fast again (because we are using FABS). Looking into a faster instruction for SSE2 machines ###@###.### 10/20/04 16:38 GMT Got ABS intrinsified... 4cycles on a PC, 7 cycles on a SPARC (same as an ADD) and working on C1 now. ###@###.### 10/25/04 19:41 GMT
20-10-0004