FULL PRODUCT VERSION :
openjdk version "1.8.0_51"
OpenJDK Runtime Environment (build 1.8.0_51-b16)
OpenJDK 64-Bit Server VM (build 25.51-b03, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
4.1.3-201.fc22.x86_64
EXTRA RELEVANT SYSTEM CONFIGURATION :
32GB RAM Quad core sandybridge
A DESCRIPTION OF THE PROBLEM :
I have recently noticed that the built in java Arrays.hashcode is broken for small arrays, this results in massive collisions for arrays that only have a few values causing hash maps that store these arrays as keys (using wrappers overriding hashcode to use Arrays.hashcode) to become nothing but a series of Lists or completely unusable.
eg, running arrays.hashcode on a byte[] of length 4 a quarter billion times (where contents of array is different each time) results in 248 million collisions out of 248,720,625 loops.
Here's my test code so you can see for yourself (lots of ram needed)
HashSet<Integer> hs = new HashSet<>(100000000,(float) 1.0);
long collide = 0;
long totalLoops = 0;
byte[] ba = new byte[4];
System.gc();
int hash=1;
long time = System.currentTimeMillis();
for(byte d=0; d<15; d++) {
ba[0] = d;
for(byte i=-128; i<127; i++) {
ba[1] = i;
for(byte k=-128; k<127; k++) {
ba[2] = k;
for(byte j=-128; j<127; j++) {
ba[3] = j;
hash = Arrays.hashcode(ba);
if(hs.contains(hash)) {
collide++;
} else {
hs.add(hash);
}
totalLoops++;
}
}
}
}
System.out.println("total time =" + (System.currentTimeMillis() - time));
System.out.println("collisions=" + collide + " loops=" + totalLoops);
System.exit(0);
This occurs because the 31 multipliers range is not high enough. If you replace 31 with 257 prime you will get 0 collisions with about 4 times the run time. The massive lookup time from running a hashmap thats become a series of lists from using a poor array hash functions will be an order of magnitude more than the 4x speed slow down from using 257 instead of 31. And 10 seconds to hash a quarter billion entries is not so bad considering other good hash functions like murmur3 took 65 seconds on the test above. I tried other primes but I either got millions of collisions or more runtime. I would advise you to replace the 31 in Arrays.hashcode with 257.
REPRODUCIBILITY :
This bug can be reproduced always.