A DESCRIPTION OF THE REQUEST :
The routine Math.abs(float) is unnecessarily slow. This is related to my prior bug
5108893 which was that Math.abs was slow. It was fixed but apparently only for doubles as Math.abs(float) is still much slower than it should be. At this point it is actually faster to convert a float to a double, call Math.abs(double) and then convert the result back to a float than it is to call Math.abs(float) directly.
The fix should be simple, just apply the same optimization that was used to big bug 5108893 to similarly intrinsify Math.abs(float)
JUSTIFICATION :
Math.abs is very commonly used and there is no reason for Math.abs(float) should be significantly slower than calling Math.abs(double). Speeding it up will help any code that uses Math.abs(float).
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Cost of calling Math.abs(float) should be about the same as performing an add, as it is now for Math.abs(double). Instead it is much more expensive and calling
float a, b;
b = (float) Math.abs((double)a);
is actually faster than
b = Math.abs(a);
as shown in the timings below
ACTUAL -
java version "1.6.0"
Java(TM) SE Runtime Environment (build 1.6.0-b105)
Java HotSpot(TM) Server VM (build 1.6.0-b105, mixed mode)
avg 1.5 ns total 6.10E-1 s for assign (~ 2.6 cycles)
avg 3.8 ns total 1.50E0 s for add (~ 6.4 cycles)
avg 15.9 ns total 6.36E0 s for Math.abs() (~ 27.0 cycles)
avg 8.8 ns total 3.50E0 s for (float)Math.abs((double)) (~ 14.9 cycles)
avg 1.5 ns total 5.94E-1 s for assign (~ 2.5 cycles)
avg 1.7 ns total 6.71E-1 s for add (~ 2.9 cycles)
avg 16.0 ns total 6.39E0 s for Math.abs() (~ 27.2 cycles)
avg 8.7 ns total 3.50E0 s for (float)Math.abs((double)) (~ 14.9 cycles)
---------- BEGIN SOURCE ----------
import java.text.DecimalFormat;
import java.util.Random;
/** Test that show that Math.abs on floats is much slower than a floating point add
* even though it should have about the same cost
*
* @author bjw Bruce Walter, Cornell Program of Computer Graphics 2004
*/
public class AbsFloatTest {
//target total number of repetitions of the operation
public static final int opTargetRepetitions = 400000000;
//size of arrays that are operated on
public static final int arraySize = 10000;
//number of times we need to process each array to reach total target count
public static final int reps = opTargetRepetitions/arraySize;
//pretty print the output numbers to make them easier to read
public static final DecimalFormat decForm = new DecimalFormat("###0.0");
public static final DecimalFormat sciForm = new DecimalFormat("0.00E0");
//my processor is a 1.7GHz Xeon (actually it is a dual processor, but this test is single threaded)
public static final double ghzProcSpeed = 1.7; //my processor is 1.7GHz
public static void runTimingTest(TestOp op, float result[], float src[], boolean print) {
long time = System.currentTimeMillis();
for(int i=0;i<reps;i++) {
op.performOp(result,src);
}
time = System.currentTimeMillis() - time;
double denom = 1000000.0/(reps*src.length);
if (print) {
String ps = decForm.format(time*denom);
while (ps.length()<6) ps = " "+ps;
ps = "avg "+ps+" ns total "+sciForm.format(time/1000.0)+" s";
while (ps.length()<32) ps += " ";
ps = ps+" for "+op.toString();
while (ps.length()<50) ps += " ";
System.out.println(ps+"\t(~ "+decForm.format(time*denom*ghzProcSpeed)+" cycles)");
}
}
public static void main(String[] args) throws InterruptedException {
float src[] = new float[arraySize];
float result[] = new float[arraySize];
Random ran = new Random(5232482349538L);
//set the src array to be random values between -1 and 1 (but excluding zero)
for(int i=0;i<src.length;i++) {
do {
src[i] = 2*ran.nextFloat() - 1.0f;
} while (src[i] == 0);
}
TestOp tests[] = { new AssignOp(), new AddOp(), new AbsOp(), new AbsViaDoubleOp()};
//warm up hotspot
for(int i=0;i<tests.length;i++) {
runTimingTest(tests[i],result,src,false);
}
//now run the real tests and print the timings
for(int i=0;i<tests.length;i++) {
runTimingTest(tests[i],result,src,true);
}
//do it again to show the timings are reasonably consistent
for(int i=0;i<tests.length;i++) {
runTimingTest(tests[i],result,src,true);
}
}
public abstract static class TestOp {
public abstract void performOp(float result[], float src[]);
}
public static class AssignOp extends TestOp {
public String toString() { return "assign"; }
public void performOp(float result[], float src[]) {
for(int i=0;i<src.length;i++) {
result[i] = src[i];
}
}
}
public static class AddOp extends TestOp {
public String toString() { return "add"; }
public void performOp(float result[], float src[]) {
for(int i=0;i<src.length;i++) {
result[i] = 0.143f+src[i];
}
}
}
public static class AbsOp extends TestOp {
public String toString() { return "Math.abs()"; }
public void performOp(float result[], float src[]) {
for(int i=0;i<src.length;i++) {
result[i] = Math.abs(src[i]);
}
}
}
public static class AbsViaDoubleOp extends TestOp {
public String toString() { return "(float)Math.abs((double))"; }
public void performOp(float result[], float src[]) {
for(int i=0;i<src.length;i++) {
result[i] = (float)Math.abs((double)src[i]);
}
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Using casting to double, calling Math.abs(double), and then casting back to float is somewhat faster than Math.abs(float) currently, but is somewhat ugly coding.