JDK-6506405 : Math.abs(float) is slow
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 6
  • Priority: P5
  • Status: Resolved
  • Resolution: Fixed
  • OS: windows_2000
  • CPU: x86
  • Submitted: 2006-12-19
  • Updated: 2021-07-19
  • Resolved: 2021-07-14
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 18
18 b06Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
A DESCRIPTION OF THE REQUEST :
The routine Math.abs(float) is unnecessarily slow.  This is related to my prior bug
5108893 which was that Math.abs was slow.  It was fixed but apparently only for doubles as Math.abs(float) is still much slower than it should be.  At this point it is actually faster to convert a float to a double, call Math.abs(double) and then convert the result back to a float than it is to call Math.abs(float) directly.

The fix should be simple, just apply the same optimization that was used to big bug 5108893 to similarly intrinsify Math.abs(float)

JUSTIFICATION :
Math.abs is very commonly used and there is no reason for Math.abs(float) should be significantly slower than calling Math.abs(double).  Speeding it up will help any code that uses Math.abs(float).

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Cost of calling Math.abs(float) should be about the same as performing an add, as it is now for Math.abs(double).  Instead it is much more expensive and calling

float a, b;

b = (float) Math.abs((double)a);

is actually faster than

b = Math.abs(a);

as shown in the timings below
ACTUAL -
java version "1.6.0"
Java(TM) SE Runtime Environment (build 1.6.0-b105)
Java HotSpot(TM) Server VM (build 1.6.0-b105, mixed mode)

avg    1.5 ns   total 6.10E-1 s  for assign             (~ 2.6 cycles)
avg    3.8 ns   total 1.50E0 s   for add                (~ 6.4 cycles)
avg   15.9 ns   total 6.36E0 s   for Math.abs()         (~ 27.0 cycles)
avg    8.8 ns   total 3.50E0 s   for (float)Math.abs((double))  (~ 14.9 cycles)
avg    1.5 ns   total 5.94E-1 s  for assign             (~ 2.5 cycles)
avg    1.7 ns   total 6.71E-1 s  for add                (~ 2.9 cycles)
avg   16.0 ns   total 6.39E0 s   for Math.abs()         (~ 27.2 cycles)
avg    8.7 ns   total 3.50E0 s   for (float)Math.abs((double))  (~ 14.9 cycles)

---------- BEGIN SOURCE ----------

import java.text.DecimalFormat;
import java.util.Random;

/** Test that show that Math.abs on floats is much slower than a floating point add
 * even though it should have about the same cost
 *
 * @author  bjw  Bruce Walter, Cornell Program of Computer Graphics 2004
 */
public class AbsFloatTest {
  //target total number of repetitions of the operation
  public static final int opTargetRepetitions = 400000000;
  //size of arrays that are operated on
  public static final int arraySize = 10000;
  //number of times we need to process each array to reach total target count
  public static final int reps = opTargetRepetitions/arraySize;
  //pretty print the output numbers to make them easier to read
  public static final DecimalFormat decForm = new DecimalFormat("###0.0");
  public static final DecimalFormat sciForm = new DecimalFormat("0.00E0");
  //my processor is a 1.7GHz Xeon (actually it is a dual processor, but this test is single threaded)
  public static final double ghzProcSpeed = 1.7; //my processor is 1.7GHz
    
  public static void runTimingTest(TestOp op, float result[], float src[], boolean print) {
    long time = System.currentTimeMillis();
    for(int i=0;i<reps;i++) {
      op.performOp(result,src);
    }
    time = System.currentTimeMillis() - time;
    double denom = 1000000.0/(reps*src.length);
    if (print) {
      String ps = decForm.format(time*denom);
      while (ps.length()<6) ps = " "+ps;
      ps = "avg "+ps+" ns   total "+sciForm.format(time/1000.0)+" s";
      while (ps.length()<32) ps += " ";
      ps = ps+" for "+op.toString();
      while (ps.length()<50) ps += " ";
      System.out.println(ps+"\t(~ "+decForm.format(time*denom*ghzProcSpeed)+" cycles)");
    }
  }
    
  public static void main(String[] args) throws InterruptedException {
    float src[] = new float[arraySize];
    float result[] = new float[arraySize];
    Random ran = new Random(5232482349538L);
    //set the src array to be random values between -1 and 1 (but excluding zero)
    for(int i=0;i<src.length;i++) {
      do {
        src[i] = 2*ran.nextFloat() - 1.0f;
      } while (src[i] == 0);
    }
    TestOp tests[] = { new AssignOp(), new AddOp(), new AbsOp(), new AbsViaDoubleOp()};
    //warm up hotspot
    for(int i=0;i<tests.length;i++) {
      runTimingTest(tests[i],result,src,false);
    }
    //now run the real tests and print the timings
    for(int i=0;i<tests.length;i++) {
      runTimingTest(tests[i],result,src,true);
    }
    //do it again to show the timings are reasonably consistent
    for(int i=0;i<tests.length;i++) {
      runTimingTest(tests[i],result,src,true);
    }
  }
  
  public abstract static class TestOp {
    public abstract void performOp(float result[], float src[]);
  }
  
  public static class AssignOp extends TestOp {
    public String toString() { return "assign"; }
    public void performOp(float result[], float src[]) {
      for(int i=0;i<src.length;i++) {
        result[i] = src[i];
      }
    }
  }
  
  public static class AddOp extends TestOp {
    public String toString() { return "add"; }
    public void performOp(float result[], float src[]) {
      for(int i=0;i<src.length;i++) {
        result[i] = 0.143f+src[i];
      }
    }
  }
  
  public static class AbsOp extends TestOp {
    public String toString() { return "Math.abs()"; }
    public void performOp(float result[], float src[]) {
      for(int i=0;i<src.length;i++) {
        result[i] = Math.abs(src[i]);
      }
    }
  }
    
  public static class AbsViaDoubleOp extends TestOp {
    public String toString() { return "(float)Math.abs((double))"; }
    public void performOp(float result[], float src[]) {
      for(int i=0;i<src.length;i++) {
        result[i] = (float)Math.abs((double)src[i]);
      }
    }
  }
 
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Using casting to double, calling Math.abs(double), and then casting back to float is somewhat faster than Math.abs(float) currently, but is somewhat ugly coding.

Comments
Changeset: c0d4efff Author: Brian Burkhalter <bpb@openjdk.org> Date: 2021-07-14 15:50:51 +0000 URL: https://git.openjdk.java.net/jdk/commit/c0d4efff3c7b853cd663726b668d49d01e0f8ee0
14-07-2021

Math.abs(float) is now marked for an intrinsic: @IntrinsicCandidate public static float abs(float a) {}
07-07-2021

The attached JMH class AbsBench gives the following results on JDK10-internal: OEL 7 Benchmark Mode Cnt Score Error Units AbsBench.absFloat avgt 10 35.334 �� 0.051 ns/op AbsBench.absFloatViaAbsDouble avgt 10 28.582 �� 0.013 ns/op Ubuntu 16.04 VM Benchmark Mode Cnt Score Error Units AbsBench.absFloat avgt 10 3.264 �� 0.057 ns/op AbsBench.absFloatViaAbsDouble avgt 10 3.327 �� 0.032 ns/op macOS Benchmark Mode Cnt Score Error Units AbsBench.absFloat avgt 10 3.285 �� 0.048 ns/op AbsBench.absFloatViaAbsDouble avgt 10 3.378 �� 0.040 ns/op Windows VM Benchmark Mode Cnt Score Error Units AbsBench.absFloat avgt 10 3.258 �� 0.248 ns/op AbsBench.absFloatViaAbsDouble avgt 10 3.333 �� 0.052 ns/op In three out of four cases the two rounding pathways show a minimal difference with the more direct float-only pathway appearing slightly faster. This issue could likely be resolved as Cannot Reproduce or Not An Issue.
28-07-2017

I recommend verifying whether or not this problem still exist. Math.abs(double) is marked @HotSpotIntrinsicCandidate but Math.abs(float) is not. However, the non-intrinsic execution the abs method is just a compare followed by either a return or a subtract and return so it shouldn't be that slow in either case.
23-08-2016

EVALUATION A reasonable request; will investigate.
20-12-2006