Name: gm110360 Date: 09/29/2004
A DESCRIPTION OF THE REQUEST :
Current JVMs implement Math.abs() using compares and branches even though there are far more efficient code sequences for computing the absolute value of a floating point value that do not require a branches. These non-branching code sequences are both much shorter and cheaper than the compare and branch code currently emitted by hotspot (in both the client and server JVMs).
For example, x87 has a fabs instruction which is slightly cheaper than an add and SSE uses a boolean & instructions (andsd for doubles andss for floats) which has the same cost as an add instruction. However as the timing results below show, currently Math.abs() is much more expensive than an add.
JUSTIFICATION :
Math.abs() is currently much slower and more expensive than it should be. It costs around six times more than an add when it should cost the same or even be slightly cheaper. The cost of Math.abs() can be significant in some codes (including some of ours) that make heavy use of it. Using the more efficient code sequences would signifcantly speed up such programs.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Expected behaviour is that Math.abs(double) and Math.abs(float) should cost about the same as an add.
ACTUAL -
Actual behaviour is that Math.abs() is much more expensive than an add as show in the timing results for the server and client 1.5.0rc JVMs below.
java version "1.5.0-rc"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-rc-b63)
Java HotSpot(TM) Server VM (build 1.5.0-rc-b63, mixed mode)
avg 2.3 ns total 4.68E-1 s for assign (~ 4.0 cycles)
avg 2.5 ns total 5.00E-1 s for add (~ 4.2 cycles)
avg 17.0 ns total 3.41E0 s for Math.abs() (~ 29.0 cycles)
avg 2.4 ns total 4.85E-1 s for assign (~ 4.1 cycles)
avg 2.5 ns total 5.00E-1 s for add (~ 4.2 cycles)
avg 17.0 ns total 3.39E0 s for Math.abs() (~ 28.8 cycles)
java version "1.5.0-rc"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-rc-b63)
Java HotSpot(TM) Client VM (build 1.5.0-rc-b63, mixed mode, sharing)
avg 5.0 ns total 1.00E0 s for assign (~ 8.5 cycles)
avg 5.3 ns total 1.06E0 s for add (~ 9.0 cycles)
avg 30.8 ns total 6.16E0 s for Math.abs() (~ 52.3 cycles)
avg 5.0 ns total 1.00E0 s for assign (~ 8.5 cycles)
avg 5.4 ns total 1.08E0 s for add (~ 9.2 cycles)
avg 30.6 ns total 6.11E0 s for Math.abs() (~ 51.9 cycles)
---------- BEGIN SOURCE ----------
import java.text.DecimalFormat;
import java.util.Random;
/** Test that show that Math.abs is much slower than a floating point add
* even though it should have about the same cost
*
* @author bjw Bruce Walter, Cornell Program of Computer Graphics 2004
*/
public class AbsTest {
//target total number of repetitions of the operation
public static final int opTargetRepetitions = 200000000;
//size of arrays that are operated on
public static final int arraySize = 10000;
//number of times we need to process each array to reach total target count
public static final int reps = opTargetRepetitions/arraySize;
//pretty print the output numbers to make them easier to read
public static final DecimalFormat decForm = new DecimalFormat("###0.0");
public static final DecimalFormat sciForm = new DecimalFormat("0.00E0");
//my processor is a 1.7GHz Xeon (actually it is a dual processor, but this test is single threaded)
public static final double ghzProcSpeed = 1.7; //my processor is 1.7GHz
public static void runTimingTest(TestOp op, double result[], double src[], boolean print) {
long time = System.currentTimeMillis();
for(int i=0;i<reps;i++) {
op.performOp(result,src);
}
time = System.currentTimeMillis() - time;
double denom = 1000000.0/(reps*src.length);
if (print) {
String ps = decForm.format(time*denom);
while (ps.length()<6) ps = " "+ps;
ps = "avg "+ps+" ns total "+sciForm.format(time/1000.0)+" s";
while (ps.length()<32) ps += " ";
ps = ps+" for "+op.toString();
while (ps.length()<50) ps += " ";
System.out.println(ps+"\t(~ "+decForm.format(time*denom*ghzProcSpeed)+" cycles)");
}
}
public static void main(String[] args) throws InterruptedException {
double src[] = new double[arraySize];
double result[] = new double[arraySize];
Random ran = new Random(5232482349538L);
//set the src array to be random values between -1 and 1 (but excluding zero)
for(int i=0;i<src.length;i++) {
do {
src[i] = 2*ran.nextDouble() - 1.0;
} while (src[i] == 0);
}
TestOp tests[] = { new AssignOp(), new AddOp(), new AbsOp()};
//warm up hotspot
for(int i=0;i<tests.length;i++) {
runTimingTest(tests[i],result,src,false);
}
//now run the real tests and print the timings
for(int i=0;i<tests.length;i++) {
runTimingTest(tests[i],result,src,true);
}
//do it again to show the timings are reasonably consistent
for(int i=0;i<tests.length;i++) {
runTimingTest(tests[i],result,src,true);
}
}
public abstract static class TestOp {
public abstract void performOp(double result[], double src[]);
}
public static class AssignOp extends TestOp {
public String toString() { return "assign"; }
public void performOp(double result[], double src[]) {
for(int i=0;i<src.length;i++) {
result[i] = src[i];
}
}
}
public static class AddOp extends TestOp {
public String toString() { return "add"; }
public void performOp(double result[], double src[]) {
for(int i=0;i<src.length;i++) {
result[i] = 0.143+src[i];
}
}
}
public static class AbsOp extends TestOp {
public String toString() { return "Math.abs()"; }
public void performOp(double result[], double src[]) {
for(int i=0;i<src.length;i++) {
result[i] = Math.abs(src[i]);
}
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
None. Attempts to force the more efficient code sequence such as:
private static double myAbs(double a) {
return Double.longBitsToDouble(Double.doubleToRawLongBits(a)&0x7fffffffffffffffL);
}
are not any faster due to the overheads of calling these methods on Double.
(Incident Review ID: 315868)
======================================================================
###@###.### 10/14/04 23:50 GMT