JDK-4979028 : BitSet.hashCode() unnecessarily slow due to extra scanning of zero bits
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.util
  • Affected Version: 1.4.2
  • Priority: P5
  • Status: Resolved
  • Resolution: Fixed
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2004-01-15
  • Updated: 2005-02-19
  • Resolved: 2005-02-19
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6
6 b25Fixed
Related Reports
Relates :  
Description
Name: jl125535			Date: 01/15/2004


A DESCRIPTION OF THE REQUEST :
The algorithm to compute the hashCode of a BitSet scans bits[] entry past the last used entry (which are normally all set to 0).


JUSTIFICATION :
If a BitSet is created then a "high" bit is set and then cleared, computing the hashCode will be unnecessarily slowly scanning the whole size of the BitSet, instead of just hashing the used part from 0 to (unitsInUse-1) (the remaining elements are normally all zeroes and should not be hashed.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The hashing loop should stop at bits[unitsInUse-1] as the hashcode must only rely on elements stored in the set, independantly of its actual size.
Also the statistical distribution of elements in the resulting hash should be stronger, as the computed hash value depends on the position of elements in the bits[] table, with some positions being non primes.
ACTUAL -
The hashing loop currently stop at bits[bits.length - 1] including all units at bits[bits.unitsInUse .. bits.length] which are all zeroes.


Note that BitSets use a dynamic reallocation of the backing store by growing exponentially.

This means that the time to compute the hashCode of a Bitset whose elements are added linearily from the lowest to the highest does not grow linearily with the elements added, but is much slower and "jumps" as soon as some "key" elements are added.

---------- BEGIN SOURCE ----------
The following code provides an optimized version for dynamic BitSets where some bits have been cleared, or been dynamically allocated but still not used. It also provides a bettter distribution of hashCodes, when BitSets are inserted in hash tables, with much less collisions:

    public int hashCode() {
        long h = 1234;
        for (int i = units.unitsInUse - 1; i >= 0; )
            h ^= (units[i--] ^ (h << 3)) + 1;

        return (int)(h >>> 32) ^ (int)h;
    }

It relies on the following documented invariants:
     * INVARIANT: The words in units[] above unitInUse - 1 are zero.
     * INVARIANT: unitsInUse is nonnegative.
     * INVARIANT: units[unitsInUse - 1] is nonzero unless unitsInUse is zero.

The final return has also a small optimization to perform XOR on 32-bit ints instead of 64 bit ints and dropping the unneeded highest 32 bits at end (it does not rely on tricky optimizations by javac at compile-time or at run-time in the Hotspot compiler).

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
No work around necessary. This is a performance issue.
(Incident Review ID: 233280) 
======================================================================

Comments
EVALUATION A good idea. ###@###.### 2005-1-25 02:41:04 GMT Unfortunately, the current code does not always maintain the invariant that bits[unitsInUse-1] != 0; That needs to be fixed first. 6222207: BitSet internal invariants may be violated ###@###.### 2005-2-09 20:06:33 GMT
25-01-2005