JDK-6296690 : Math.round() is extremely slow
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 5.0
  • Priority: P4
  • Status: Resolved
  • Resolution: Not an Issue
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2005-07-14
  • Updated: 2021-07-07
  • Resolved: 2021-07-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdResolved
Related Reports
Relates :  
Description
FULL PRODUCT VERSION :
Tested on both JDK 1.5.0_04 and 1.6.0-ea-b42.


ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows XP [Version 5.1.2600]

A DESCRIPTION OF THE PROBLEM :
Seems that Math.round() is extremely slow.

On my Centrino laptop it takes about 200 cycles to complete, when resonable value is about 5 cycles, I would say.

It is so slow, that doing it manually, in Java, using Double.doubleToRawLongBits, actually makes it 10 times faster!

Normally I would not submit a performance issue as a bug, but this one really extreme.


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Just call Math.round, and see how slow it is.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
That calling it will take ~5 machine cycles.
ACTUAL -
It takes ~200 machine cycles.

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------

/**
 * Demo program, that shows how slow {@link Math#round(double)} is.
 *
 * @author Doron Rajwan, 2005.
 */
public class TestRound {

    private static final double x0 = 500000.0, y0=3000000.0, inverseBinSizeX=1.0/30.0, inverseBinSizeY=1.0/20.0;
    private static final int samples = 12345678;
    private static final double deltaX = 12345.0 / (5 + samples);
    private static final double deltaY = 23456.0 / (5 + samples);

    private static void performanceTestStandard() {
        long start = System.nanoTime();
        int sum = 0;
        double xx = 777.1, yy = 888.1;
        for (int i = 0; i < samples; ++i) {
            xx += deltaX; yy += deltaY;
            int binI = (int) Math.round((xx - x0) * inverseBinSizeX);
            int binJ = (int) Math.round((yy - y0) * inverseBinSizeY);
            sum += binI + binJ;
        }
        long time = System.nanoTime() - start;
        System.out.println("Using standard round: time per iteration=" + ((double)time / samples) + " ns, sum=" + sum);
        if (sum != 1734030338)
            throw new AssertionError(); // force assertions here...
    }

    private static void performanceTestFast() {
        long start = System.nanoTime();
        int sum = 0;
        double xx = 777.1, yy = 888.1;
        for (int i = 0; i < samples; ++i) {
            xx += deltaX; yy += deltaY;
            int binI = fastRound((xx - x0) * inverseBinSizeX);
            int binJ = fastRound((yy - y0) * inverseBinSizeY);
            sum += binI + binJ;
        }
        long time = System.nanoTime() - start;
        System.out.println("Using FAST round: time per iteration=" + ((double)time / samples) + " ns, sum=" + sum);
        if (sum != 1734030338)
            throw new AssertionError(); // force assertions here...
    }

    private static final double twoToThe52 = (double)(1L << 52); // 2^52

    // Works like round(), but with the following differences:
    // 1. more than x10 faster.
    // 2. rounds halfs towards even.
    private static int fastRound(double a) {
        double dd = twoToThe52 + Math.abs(a);
        int ll = (int)Double.doubleToRawLongBits(dd);
        int signMask = (int)(Double.doubleToRawLongBits(a) >> 63); // 0 or -1.
        return (ll ^ signMask) - signMask;
    }

    public static void main(String[] args) {
        performanceTestStandard(); performanceTestFast();
        performanceTestStandard(); performanceTestFast();
        performanceTestStandard(); performanceTestFast();
        performanceTestStandard(); performanceTestFast();
        performanceTestStandard(); performanceTestFast();
    }

    // Sample output on my 1.5GHz Centrino laptop, using "-server -Xbatch" command line:
    // Using standard round: time per iteration=223.8108062594861 ns, sum=1734030338
    // Using FAST round: time per iteration=22.15142999841726 ns, sum=1734030338
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
See my sample code for workaround...
###@###.### 2005-07-14 00:29:52 GMT

Comments
The performance of Math.round(double) in JDK-16 GA compares favorably with that of the C library function lroundl(double) on Linux Ubuntu 20.04 LTS and on macOS 10.15.7 Catalina. The C test was compiled with default optimization. Sample results in nanoseconds per op are: Linux - C: 2.68; Java: 3.161 macOS - C: 4.17: Java: 3.306
07-07-2021

The content of the fastRound() method given above was used to change Math.round(double) to public static long round(double a) { double dd = TWO_TO_THE_52ND + Math.abs(a); long ll = Double.doubleToRawLongBits(dd); long signMask = Double.doubleToRawLongBits(a) >> 63; // 0 or -1. return (ll ^ signMask) - signMask; } With this change in place, java/lang/Math/RoundTests.java failed massively. It looks as if the proposed change is numerically inaccurate.
01-07-2021

The attached JMH class RoundBench gives the following results on JDK10-internal: OEL 7 Benchmark Mode Cnt Score Error Units RoundBench.round avgt 10 38.671 �� 0.016 ns/op RoundBench.roundViaLongBits avgt 10 31.945 �� 0.015 ns/op Ubuntu 16.04 VM Benchmark Mode Cnt Score Error Units RoundBench.round avgt 10 5.362 �� 0.065 ns/op RoundBench.roundViaLongBits avgt 10 4.184 �� 0.192 ns/op macOS Benchmark Mode Cnt Score Error Units RoundBench.round avgt 10 5.190 �� 0.049 ns/op RoundBench.roundViaLongBits avgt 10 4.111 �� 0.051 ns/op Windows VM Benchmark Mode Cnt Score Error Units RoundBench.round avgt 10 5.221 �� 0.100 ns/op RoundBench.roundViaLongBits avgt 10 4.079 �� 0.048 ns/op In general round() appears to be about 25% slower than rounding via raw bits. This is much less than the order of magnitude difference originally reported. This issue could likely be resolved as Not An Issue.
28-07-2017

The performance of the method should be evaluated on contemporary JDK's to see if this is still an issue.
14-09-2016

EVALUATION Will investigate.
12-08-2005