Bug ID: JDK-6655180 (array) Reduce JNI overhead for Arrays.equals() with float/double arrays

Type: Bug
Component: core-libs
Sub-Component: java.util
Affected Version: 6

Priority: P5
Status: Resolved
Resolution: Duplicate
OS: windows_xp
CPU: x86

Submitted: 2008-01-25
Updated: 2016-12-30
Resolved: 2016-12-30

A DESCRIPTION OF THE REQUEST :
Comparing large float or double arrays with Arrays.equals(float[], float[]) and Arrays.equals(float[], float[])  is dominated at runtime by CPU time spent in marchalling/unmarshalling the JNI interface to the native implementation of Float and Double methods to get canonical bits.

This RFE is an optimization to remove most calls to JNI implementations of static methods in java.lang.Float and java.lang.Double.

I propose below an alternative implementation that avoids this overhead.


JUSTIFICATION :
However, such use of
* public static native int Float.floatToIntBits(float value);
* public static native int Double.doubleToLongBits(double value);
is definitely not needed for comparing most float or double values in Arrays.equals(...) methods when it can be performed directly using normal Java == operators that generate normal bytecode that can be compiled directly into native CPU instructions by JIT, without such JNI calls and marshalling overhead.

My performance tests show significant improvement or example when comparing matrixes (for example in 2D/3D graphic rendering or 3D apps when matching vertices and points, or computing and matching colors in multiplane color spaces for precise transformation of bitmap images).

Comparing floatting point arrays for equality is not very common, but it happens in such contexts, and then, the methods are used at a very high frequency rate and do need better optimization.

The need of getting and comparing bits is justified ONLY when comparing zeroes (because these methods make a distinction between positive and negative zeroes), or because it wants to make NaNs comparable. In many applications however, NaNs are not used (or seen as wrong anyway), or it does not matter if they are different from each other.

The other use of Arrays.equal() with floatting point arrays is when hashing them for use as keys in HashMaps or Sets: it is used to detect key collisions in the Set or Map when storing them, and Arrays.equal become much more frequent when the Set or Map is near the threshold, and used at high rate when reindexing the Set or Map when it is resized after reaching the threshold: with the 75% load factor and random distribution (not guaranteed by arrays of native types because their hashes are quite simple), there's nearly 50% of collisions, when resizing the Set or Map, about 25% of elements in the new set will have collisions and will need to be compared with Arrays.equal, creating heavy use of the JNI interface, if the Set or Map is already large. And anyway, each time we get any element from the Set or HashMap, we need to compare the key (and in this case, most of the time, they will be equal, so if a key is an floatting point array, all elements in the floatting point array are compared in the loop; if the floatting point array size is large, this means a significant cost spent in comparing them).

The code below optimizes this situation for much better performance after JIT compilation, and negligeable cost when the code is interpreted.


---------- BEGIN SOURCE ----------
// package java.lang;
// public class Arrays {
public class Test {

    public static boolean equals(final float[] a1, final float[] a2) {
        if (a1 != a2) {
            int index;
            if (a1 != null && a2 != null && (index = a1.length) == a2.length) {
                while (--index >= 0) {
                    final float f1, f2;
                    if ((f1 = a1[index]) != (f2 = a2[index])
                            // (either f1 or f2 may be a NaN)
                            && f1 == f1 // ! Float.isNaN(f1)
                            && f2 == f2 // ! Float.isNaN(f2)
                            // (we don't need bits for NaNs)
                            // Otherwise f1 == f2 but may be both zeroes
                            || f1 == 0.0f // f1 is 0.0f or -0.0f
                            && f2 == 0.0f // f2 is 0.0f or -0.0f
                            // (test if distinct zeroes)
                            && Float.floatToIntBits(f1) != Float
                                    .floatToIntBits(f2))
                        return false;
                }
                return true;
            }
            return false;
        }
        return true;
    }

    public static boolean equals(final double[] a1, final double[] a2) {
        if (a1 != a2) {
            java.util.HashMap h;
            int index;
            if (a1 != null && a2 != null && (index = a1.length) == a2.length) {
                while (--index >= 0) {
                    final double d1, d2;
                    if ((d1 = a1[index]) != (d2 = a2[index])
                            // (either d1 or d2 may be a NaN)
                            && d1 == d1 // ! Double.isNaN(d1)
                            && d2 == d2 // ! Double.isNaN(d2)
                            // (we don't need bits for NaNs)
                            // Otherwise d1 == d2 but may be both zeroes
                            || d1 == 0.0d // d1 is 0.0d or -0.0d
                            && d2 == 0.0d // d2 is 0.0d or -0.0d
                            // (test if distinct zeroes)
                            && Double.doubleToLongBits(d1) != Double
                                    .doubleToLongBits(d2))
                        return false;
                }
                return true;
            }
            return false;
        }
        return true;
    }

}

---------- END SOURCE ----------

This issue was verified to have been effectively addressed by the fix for JDK-8136924. A prospective fix for JDK-8172156 would addressed any residual fine points.
30-12-2016
The attached JMH benchmark showed a huge improvement in Arrays.equals() for float and double arrays between Java 8u40 and Java 9ea128. 8u40 Benchmark Mode Cnt Score Error Units ArraysEqualsBenchmark.equalsDouble0 avgt 5 5.008 �� 0.113 ns/op ArraysEqualsBenchmark.equalsDouble9 avgt 5 29.163 �� 0.495 ns/op ArraysEqualsBenchmark.equalsDoubleN2 avgt 5 10338.867 �� 312.107 ns/op ArraysEqualsBenchmark.equalsDoubleN avgt 5 15959.736 �� 287.940 ns/op ArraysEqualsBenchmark.equalsFloat0 avgt 5 5.025 �� 0.113 ns/op ArraysEqualsBenchmark.equalsFloat9 avgt 5 28.060 �� 0.407 ns/op ArraysEqualsBenchmark.equalsFloatN2 avgt 5 10909.571 �� 248.822 ns/op ArraysEqualsBenchmark.equalsFloatN avgt 5 24134.669 �� 870.636 ns/op 9ea128 Benchmark Mode Cnt Score Error Units ArraysEqualsBenchmark.equalsDouble0 avgt 5 8.080 �� 0.390 ns/op ArraysEqualsBenchmark.equalsDouble9 avgt 5 10.647 �� 0.161 ns/op ArraysEqualsBenchmark.equalsDoubleN2 avgt 5 1497.351 �� 24.063 ns/op ArraysEqualsBenchmark.equalsDoubleN avgt 5 2334.910 �� 24.970 ns/op ArraysEqualsBenchmark.equalsFloat0 avgt 5 8.336 �� 0.194 ns/op ArraysEqualsBenchmark.equalsFloat9 avgt 5 9.333 �� 0.188 ns/op ArraysEqualsBenchmark.equalsFloatN2 avgt 5 758.352 �� 22.003 ns/op ArraysEqualsBenchmark.equalsFloatN avgt 5 1507.173 �� 19.918 ns/op
15-12-2016
Attached JMH benchmark.
15-12-2016
The issues raised in this bug were most likely addressed as part of JDK-8136924; however, that should be verified.
26-05-2016

Relates :	JDK-8136924 - Vectorized support for array equals/compare/mismatch using Unsafe
Relates :	JDK-8172156 - (array) Tune thresholds for vectorized mismatch in Arrays.equals()