JDK-6530203 : Arrays.hashCode(long a[]) calculates bad hash code for 0 and -1 values
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util
  • Affected Version: 6
  • Priority: P4
  • Status: Closed
  • Resolution: Won't Fix
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2007-03-01
  • Updated: 2015-08-21
  • Resolved: 2007-03-01
Related Reports
Relates :  
Description
FULL PRODUCT VERSION :
java version "1.6.0"
Java(TM) SE Runtime Environment (build 1.6.0-b105)
Java HotSpot(TM) Client VM (build 1.6.0-b105, mixed mode, sharing)


ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows XP [Version 5.1.2600]

A DESCRIPTION OF THE PROBLEM :
Please look at the implementation of Arrays.hashCode(long a[]) method. The basic loop is the following:

        for (long element : a) {
            int elementHash = (int)(element ^ (element >>> 32));
            result = 31 * result + elementHash;
        }

It's obvious that if low and high words in the element are equal, then elementHash will be zero. There is, at least, two very popular integers producing this result: 0 and -1. For both values, the expression "(int)(element^(element>>>32))" returns 0.

This problem really occurred in one of my applications. By default, long[] arrays are filled by zero, and -1 is used sometimes as usual or signal value. If I create new long array[], calculate hash code, then assign -1 to some elements and calculate hash code again, I see the same result. I think it is a bug: the probability of same hash codes for different arrays is too high and can lead to serious problems in algorithms that often assign -1 to elements of long[] arrays.


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Please compile and run the attached test.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Different hash codes for the tested arrays.
ACTUAL -
Hash code is not changed while any long[] array modifications where 0 element is replaced with -1 or vice versa.

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.util.Arrays;
public class Test {
    public static void main(String[] args) {
        long[] a = new long[10];
        for (int k = 0; k < a.length; k++)
            System.out.print(a[k] + " ");
        System.out.println("hashCode = " + Arrays.hashCode(a));
        a[0] = -1;
        for (int k = 0; k < a.length; k++)
            System.out.print(a[k] + " ");
        System.out.println("hashCode = " + Arrays.hashCode(a));
        a[5] = -1;
        for (int k = 0; k < a.length; k++)
            System.out.print(a[k] + " ");
        System.out.println("hashCode = " + Arrays.hashCode(a));
    }
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
You could use crc32 native algorithm for hash code calculation for any arrays. Measuring shows that your java.util.zip.CRC32.update(byte[] b) method has almost the same performance as Arrays.hashCode. Now I've created own methods for calculation array hash codes, that convert all non-byte primitive types to byte arrays and then call CRC32.update. I think you could implement more efficient solution if you offer native methods for arrays of all primitive types.

Comments
EVALUATION I agree with the submitter that the current hash calculation is suboptimal. Unfortunately, the current behavior is completely specified and thus can not be changed. We take compatibility seriously. Sorry. See specifications: http://java.sun.com/javase/6/docs/api/java/lang/Long.html#hashCode() http://java.sun.com/javase/6/docs/api/java/util/List.html#hashCode() http://java.sun.com/javase/6/docs/api/java/util/Arrays.html#hashCode(long[])
01-03-2007