JDK-8252556 : deepHashCode may be improved to generate consistent hash code
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.util
  • Affected Version: 14
  • Priority: P4
  • Status: Closed
  • Resolution: Won't Fix
  • OS: generic
  • CPU: generic
  • Submitted: 2020-08-29
  • Updated: 2020-09-04
  • Resolved: 2020-09-04
Related Reports
Relates :  
Description
A DESCRIPTION OF THE PROBLEM :
The current implementation of deepHashCode generates different hash codes for the exact same object array running on different VM executions. This behaviour happen only when given object array contains an Enum value somewhere in given object tree. The .hashCode() method of Enum classes may generate by definition different hash codes and therefore this is not a bug (see javadoc of Object.hashCode() : "This integer need not remain consistent from one execution of an application to another execution of the same application.") . But a consistent hash code is almost strictly required by cache key generator implementations. 

Changing the behaviour of Enum.hashCode() has been already suggested here (https://bugs.openjdk.java.net/browse/JDK-8050217) but unfortunately revoked. I understand the backward compatibility considerations but Enum.hashCode() may actually return "name based" hash code and this will always generate a collision free but consistent hash code.

If this is not possible I suggest to change the behaviour of deepHashCode as follows:

    public static int deepHashCode(Object a[]) {
        if (a == null)
            return 0;

        int result = 1;

        for (Object element : a) {
            final int elementHash;
            final Class<?> cl;
            if (element == null)
                elementHash = 0;
            else if (element.getClass().isEnum())
                elementHash = ((Enum<?>)element).name().hashCode();
            else if ((cl = element.getClass().getComponentType()) == null)
                elementHash = element.hashCode();
            else if (element instanceof Object[])
                elementHash = deepHashCode((Object[]) element);
            else
                elementHash = primitiveArrayHashCode(element, cl);

            result = 31 * result + elementHash;
        }

        return result;
    }

The following additional two lines (see also above code snippet) checks if the current array element is an Enum and in that case it uses the name value of the Enum to generate hash code. 

            else if (element.getClass().isEnum())
                elementHash = ((Enum<?>)element).name().hashCode();

If this change is too implicit and may cause compatibility problems, you can add an overloaded deepHashCode method with an additional boolean flag to use name based enum hash code generation such as:

public static int deepHashCode(Object a[], boolean useEnumName) { .. }

Thank you.



Comments
For the above reasons, I'm closing this as Won't Fix.
04-09-2020

A couple quick thoughts: We probably wouldn't want to change just deepHashCode() to do this. If this were done, it would disagree with Arrays.hashCode(Object[]) if the argument doesn't contain nested arrays. Arrays.hashCode(Object[]) could be changed too, but that method is defined as-if the array elements were members of a List, whose definition and implementations would also have to change. To do this right we'd have to change the hashCode of Enum constants themselves, as requested by JDK-8050217. Unclear whether that's worth revisiting. As an aside, the best way to check whether an object is an Enum is with "instanceof Enum". Calling getClass().isEnum() fails if one of the enum constants is declared with a class body. See the Class.isEnum method spec for details.
02-09-2020

Moved to JDK for further evaluations.
31-08-2020