JDK-6579200 : (coll) HashSet.contains method violates Set.contains contract
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util:collections
  • Affected Version: 6,6u23,6u24
  • Priority: P4
  • Status: Resolved
  • Resolution: Not an Issue
  • OS: linux,windows_xp,windows_7
  • CPU: x86
  • Submitted: 2007-07-11
  • Updated: 2020-05-20
  • Resolved: 2013-05-06
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8
8Resolved
Related Reports
Duplicate :  
Duplicate :  
Relates :  
Description
FULL PRODUCT VERSION :
java version "1.6.0_02-ea"
Java(TM) SE Runtime Environment (build 1.6.0_02-ea-b02)
Java HotSpot(TM) Client VM (build 1.6.0_02-ea-b02, mixed mode, sharing)


ADDITIONAL OS VERSION INFORMATION :
Linux calmer 2.6.9-42.0.10.EL #1 Fri Feb 16 17:06:10 EST 2007 i686 i686 i386 GNU/Linux


A DESCRIPTION OF THE PROBLEM :
The contains  method of the Set interface promises:

"Returns true if this set contains the specified element. More formally, returns true if and only if this set contains an element e such that (o==null ? e==null : o.equals(e))."

The HashSet implementation of the contains method violates this contract by returning "false" for objects it contains, under some circumstances.

Using a HashSet one can set up a simple loop where each iteration fetches an element from the set, asks the set if it contains said element, and puts out a msg if the set does not contain the element it just gave you. Eg:

  for (E elem : mySet)
    if (!mySet.contains(elem))
      print(errMsg);

Under certain common situations, you will actually see the error msg, leading to the bizarre situation where the set does not contain elements that it just gave you!

The problem stems from using a HashMap to back the set, using the hashCode of the elements as a key, and then assuming the hashCode of an element doesn't change once the element is placed in the set.

It is very common to use value-equality (as opposed to reference-equality) for objects that are to be used in a set.  This means overriding the equals method to compare values.  Since, according to the javadoc for Object.equals, "equal objects must have equal hash codes", this also means overriding hashCode.  If the objects in the set are mutable, the hashCodes of the objects may change while they live in the set.  If the initial hashCode was changed, the current implementation of HashSet.contains loses its knowledge about the containment of the object.

(I understand that one issue with placing mutable objects in a set means that the set may not be able to guarantee uniqueness of its elements -- that it can only be a gate-keeper of uniqueness upon an element's entry into a set, and that it cannot know if its elements have later been made equal behind its back.  That issue is, i believe, well understood and accepted by developers as a risk of placing mutable objects in a set.  This issue of a set saying "I don't contain the element i just gave you" is fundamentally different and a consequence of the way the set was implemented -- something which those of us who "program to the interface" would like to remain blissfully unaware.)

Right now HashSet.contains returns "map.containsKey(o)".  It might instead return "true" if "map.containsKey(o)" is "true", but not return "false" until after it has done a more exhaustive search on all if its elements.  Sure, this slows things down for objects whose hash codes have changed since entry into the set, but it does something more important: it upholds the contract it has with the consumers of the Set interface.

This issue is similar, but not identical, to Bug IDs 4802577 and 4459681 -- both of which were deemed "not a bug".  Before you classify this as not a bug, imagine this scenario:

You are an object that receives Set<Foo> from some other object.  You have no idea where the Set was originally created and populated, and no idea what implementation of Set was used.  If you have an object, myFoo, that really is in the set, is it right for Set<Foo>'s contain method to return false?  If you're programming to the interface, the answer is "no".

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Put the class found below in $JAVA_HOME/dmh/set (or change its package and put it where you like), then run it.


EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Termination of program with no messages.
ACTUAL -
A single message stating:

Element named '[foo]' was pulled from the set, but the set claims it does not contain that element!

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
package dmh.set;

import java.util.HashSet;
import java.util.Random;
import java.util.Set;

/** Test of adherence of HashSet to Set interface. */
public class HashSetTest
{
  private static char NEXT_NAME = 'A';
  
  private char    name;
  private boolean b1;
  private boolean b2;
  private char    c1;
  private char    c2;
  private int     i1;
  private int     i2;
  private double  d1;
  private double  d2;
  
  /** Make new instance with random values. */
  public HashSetTest()
  {
    name = NEXT_NAME++;
    
    Random rand = new Random();
    
    b1 = rand.nextDouble() < 0.5;
    b1 = rand.nextDouble() < 0.5;
    
    c1 = (char)rand.nextInt();
    c2 = (char)rand.nextInt();
    
    i1 = rand.nextInt();
    i2 = rand.nextInt();
    
    d1 = rand.nextDouble() * rand.nextInt();
    d2 = rand.nextDouble() * rand.nextInt();
  }
  
  /** Returns <i>true</i> if {@code o} is value-equal to this object. */
  @Override
  public boolean equals(Object o)
  {
    //Quick exit if o is null
    if (o == null)
      return false;
    
    //Quick exit if o is this
    if (o == this)
      return true;
    
    //Quick exit if classes are different
    if (!o.getClass().equals(this.getClass()))
      return false;

    //Safe cast
    HashSetTest other = (HashSetTest)o;
    
    return
      other.b1 == this.b1 && other.b2 == this.b2 &&
      other.c1 == this.c1 && other.c2 == this.c2 &&
      other.i1 == this.i1 && other.i2 == this.i2 &&
      other.d1 == this.d1 && other.d2 == this.d2;
  }
  
  @Override
  public int hashCode()
  {
    int result = 17;
    
    result = 37 * result + Boolean.valueOf(b1).hashCode();
    result = 37 * result + Boolean.valueOf(b2).hashCode();
    
    result = 37 * result + new Character(c1).hashCode();
    result = 37 * result + new Character(c2).hashCode();
    
    result = 37 * result + new Integer(i1).hashCode();
    result = 37 * result + new Integer(i2).hashCode();
    
    result = 37 * result + new Double(d1).hashCode();
    result = 37 * result + new Double(d2).hashCode();
    
    return result;
  }
  
  //==========================================================================//
  
  /**
   * Creates a HashSet of elements, modifies one element, then tests to
   * ensure that the set knows the modified element is still part of the
   * set.  Unfortunately, the HashSet's contains method loses the fact
   * that the element is part of the set.
   */
  public static void main(String[] args) throws Exception
  {
    Set<HashSetTest> testSet = new HashSet<HashSetTest>();
    
    for (int e=0; e < 10; e++)
      testSet.add(new HashSetTest());
    
    //Ensure that set knows what it contains
    testContainmentOfOwnElements(testSet);
    
    //Grab an element of the set, modify it, and retest for containment.
    Random rand = new Random();
    
    HashSetTest element =
      (HashSetTest)testSet.toArray()[rand.nextInt(testSet.size())];
    
    element.b1 = !element.b1;
    element.c1 /= 2;
    element.i1 *= 3;
    element.d1 *= element.d2;
    
    testContainmentOfOwnElements(testSet);
  }
  
  /**
   * Fetches every element from {@code set} and then asks {@code set}
   * if it contains the element.  For each element retrieved from the
   * set that the set denies it contains, a message is displayed
   * on the console.
   */
  private static void testContainmentOfOwnElements(Set<HashSetTest> set)
  {
    for (HashSetTest elem : set)
    {
      if (!set.contains(elem))
        System.out.println(
          "Element named '" + elem.name +
          "' was pulled from the set, but the set claims it does not contain that element!");
    }
  }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
The above description is a vast simplification of our actual situation.  We really have an object Foo which contains a "private Set<Foo> prerequisites" variable, which can lead to a tree of Foo.  We use JAXB to put Foo into an external XML file.  The Foo objects that are used as prerequisites use XML's IDREF type.  When we put Foo to XML and then bring it back from XML into another Foo instance, our equals method fails on testing the sets of prerequisites.  It looks like the Foo prereqs enter the set before they are fully formed (presumably due to resolving the IDREFs).  This means their hash codes change after they enter the set, under the control of JAXB, not our app code.

We'll probably rework Foo's equals method so that we no longer have "other.fooSet.equals(this.fooSet)".  Putting other.fooSet and this.fooSet into NEW sets just before testing for equality might work.

Comments
There's some FUD on this here: http://blog.mgm-tp.com/2012/03/hashset-java-puzzler/ Title was formerly "How JDK's HashSet violates the Set.contains() contract" but has been changed to "Consequences when using Mutable Fields in hashCode()". Also here's the Reddit comment thread: http://www.reddit.com/r/programming/comments/qucci/how_jdks_hashset_violates_the_setcontains/ The blogosphere seems mostly to have corrected this. I certainly don't think we need to change our implementation anywhere. Maybe we need to change the docs to clarify something, but I'm not sure what.
20-05-2020

The contract of HashSet and HashMap does not allow for result of hashCode() to change during tenure of an object in the collection nor may the set of objects to which it is equal change during the tenure.
06-05-2013

EVALUATION The Hashtable class comment already contain this: * To successfully store and retrieve objects from a hashtable, the * objects used as keys must implement the <code>hashCode</code> * method and the <code>equals</code> method. <p> This could be improved, and copied to the HashMap and HashSet class comments. The spec for Object.hashCode could also be improved.
11-07-2007

EVALUATION The Java Collections Framework plays fast and loose with value and identity semantics. More properly, those would be cleanly separated, as for example in ML. Note however that "correct" programming languages like ML are not achieving the world domination they might deserve. Read (and re-read) this great paper: http://home.pipeline.com/~hbaker1/ObjectIdentity.html From http://java.sun.com/javase/6/docs/api/java/lang/Object.html#hashCode() Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. It is clear that many objects, including most collections, violate this rule. The spec for collections that use hash codes internally should clarify that the hash codes of elements must remain constant during their tenure as elements. I will re-classify this as a documentation fix.
11-07-2007