United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-6321873 : (spec) System.identityHashCode doc inadequate, Object.hashCode default implementation docs mislead

Details
Type:
Bug
Submit Date:
2005-09-09
Status:
Open
Updated Date:
2012-01-08
Project Name:
JDK
Resolved Date:
Component:
core-libs
OS:
linux
Sub-Component:
java.lang
CPU:
x86
Priority:
P3
Resolution:
Unresolved
Affected Versions:
6
Targeted Versions:
tbd_major

Related Reports
Relates:

Sub Tasks

Description
FULL PRODUCT VERSION :


A DESCRIPTION OF THE PROBLEM :
The documentation for Object.hashCode states:

"As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)"

  From Usenet discussions and Open Source Software it appears that many, perhaps majority, of programmers take this to mean that the default implementation, and hence System.identityHashCode, will produce unique hashcodes.

The suggested implementation technique is not even appropriate to modern handleless JVMs, and should go the same way as JVM Spec Chapter 9.

The qualification "As much as is reasonably practical," is, in practice, insufficient to make clear that hashcodes are not, in practice, distinct.


REPRODUCIBILITY :
This bug can be reproduced always.
-----
The description omits the part of the hashCode specification that is a
clear warning to programmers to avoid any uniqueness assumptions:

"It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then calling the hashCode method on
each of the two objects must produce distinct integer results."

But the problem description seems to have more to do with the fact that
hashCode values have longer lifetimes than the objects that produced
them, making reuse visible even when an implementation provides unique
values for all objects that are reachable at any moment in time, either
via handle addresses or naked object pointers. The relative ease with
which users can detect reuse would vary not just with a relationship to
memory addresses (i.e. use of handles) but with a range of policies involving memory management. 

Also, more and more often hashCode implementations are faced with mapping very large sets of values into a 32 bit Java integer, so uniqueness is more obviously impossible to mandate.

But this is getting away from fact that the hashCode spec is explicit about not requiring uniqueness on the one hand but the System.identityHashCode doc is glaringly isolated on the other. The value of a "see also" for the latter method is made obvious by this RFE. It might also be judged appropriate to add a line to the above hashCode javadoc bullet item to eliminate any chance of wishful thinking. 

"It would be impossible for any implementation to comply with a mandate for unique integer values for all objects given the relationship between the size of an integer and object storage capacities, both existing and contemplated."

                                    

Comments
EVALUATION

I agree it would be good to add a warning that user code must not assume
that distinct objects have distinct hash codes.
                                     
2005-09-09
EVALUATION

--------------
The description omits the third hashCode specification bullet item that is a
crystal clear warning to programmers to avoid any uniqueness assumptions:

"It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then calling the hashCode method on
each of the two objects must produce distinct integer results."

But the problem description seems to have more to do with the fact that
hashCode values have longer lifetimes than the objects that produced
them, making reuse visible even when an implementation provides unique
values for all objects that are reachable at any moment in time, either
via handle addresses or naked object pointers. The relative ease with
which users can detect reuse would vary not just with a relationship to
memory addresses but with a range of policies involving memory
management. 

Also, more and more often hashCode algorithms are faced with mapping very large sets of values into a 32 bit Java integer, so uniqueness is more obviously impossible to mandate as time goes on.

But this is getting away from fact that the hashCode spec is explicit about not requiring uniqueness on the one hand but the System.identityHashCode doc is glaringly isolated on the other. The value of a "see also" for the latter method is made obvious by this RFE and consideration will be given to adding a sentence to the above bullet item to try to eliminate any remaining chance of misunderstanding.
                                     
2005-09-09
SUGGESTED FIX

(BUGSTER IS THROWING AWAY MY INFORMATION: THIS IS THE THIRD FLIPPING TIME I'VE TYPED THIS IN!!)

Add an @see java.lang.Object.hashCode to System.identityHashCode.

Consult to determine if a CCC case to extend the Object.hashCode spec would be worthwhile. For instance something like the following could be added as an additional sentence for the third bullet item (ending "the performance of hashtables.")

"A mandate for hash value uniqueness for all objects is made impossible by the current or future size of object storage as it relates to the range of values of an integer."

~
                                     
2005-09-09
EVALUATION

Sun's Java SE implementations (all, if not "most") can be easily found to create non-unique hashcodes for a set of reachable objects. A short, sharp shock in the System.identityHashCode doc in addition to the "see also" will be considered.
                                     
2005-09-30



Hardware and Software, Engineered to Work Together