JDK-6611830 : UUID thread-safety and performance improvements
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util
  • Affected Version: 6
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2007-10-01
  • Updated: 2011-05-18
  • Resolved: 2011-05-18
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7
7 b136Fixed
Description
Josh Bloch writes:

Google's Daniel Aioanei <###@###.###> points out that the java.util.UUID, while supposedly immutable, is not thread-safe, due errors in caching the variant and hashCode.  Let's first take a look at the code for variant:

 87     private transient int variant = -1;

277     public int variant() {
278         if (variant < 0) {
279             // This field is composed of a varying number of bits
280             if ((leastSigBits >>> 63) == 0) {
281                 variant = 0;
282             } else if ((leastSigBits >>> 62) == 2) {
283                 variant = 2;
284             } else {
285                 variant = (int)(leastSigBits >>> 61);
286             }
287         }
288         return variant;
289     }


At first glance, this code appears to be equivalent to the hashCode caching code for java.lang.String , which is known to be correct.  But there's one key difference: this code uses -1, rather than 0, as a marker for the unintialized value.  If a UUID is passed from one thread to another with no synchronization, it is possible for the latter thread to see the variant field in its truly uninitialized (0) state, which will cause it to falsely assume that the variant is 0.

Looking at the computation, it's way too cheap to justify lazy computation. and it can be made even cheaper.  Here's a branch-free computation:

    private transient final int variant =
              (int) ((leastSigBits >>> (64 - (leastSigBits >>> 62)))
                      & (leastSigBits >> 63));

Unfortunately, serialization has serious deficiencies when it comes to transient final fields: you must set them in the readObject method, and setting a final field is, um, tricky.  You can do it using setAccessble, but that can interact badly with some security managers.  Alternatively, you can use sun.misc.Unsafe, as I believe is done for System.in and System.out.  An alternative approach is not to cache the variant but to recalculate it every time.  This is simple, and perhaps so cheap that it's worth considering:

    public int variant() {
        return (int) ((leastSigBits >>> (64 - (leastSigBits >>> 62)))
                      & (leastSigBits >> 63));
    }

Caching for version(), timestamp(), clockSequence(), and node() are broken in similar manner, and similar comments apply.



On to hashCode:

107     private transient int hashCode = -1;

416     public int hashCode() {
417         if (hashCode == -1) {
418             hashCode = (int)((mostSigBits >> 32) ^
419                              mostSigBits ^
420                              (leastSigBits >> 32) ^
421                              leastSigBits);
422         }
423         return hashCode;
424     }

This code is broken in two ways, the first is as explained above for variant.  The second is that this code reads the hashCode field twice.  There is no guarantee that the second value with be the same as the first.  The first read could return 0 (totally uninitialized), and the second could return -1 "uninitialized," in  which case the method would return -1 incorrectly.  In fact, in the total absence of locking, the second read could return an "earlier" value: the first could return the properly computed value, and the second could (erroneously) return 0 or -1.  Again, the computation is far too cheap to justify lazy initialization, so the obvious fixes are either to make the hashCode field constant and initialize it at UUID creation time or (if deserialization proves too painful) to simply recalculate it each time:

    public int hashCode() {
        return (int)  (  (mostSigBits  >> 32) ^ mostSigBits
                       ^ (leastSigBits >> 32) ^ leastSigBits);
    }

The current equals computation, while correct, is needlessly complex:

    public boolean equals(Object obj) {
    if (!(obj instanceof UUID))
        return false;
        if (((UUID)obj).variant() != this.variant ())
            return false;
        UUID id = (UUID)obj;
        return (mostSigBits == id.mostSigBits &&
                leastSigBits == id.leastSigBits);
    }

The check on variant() is extraneous, and highly unlikely to improve performance.  Better is:

    public boolean equals(Object obj) {
        if (!(obj instanceof UUID))
            return false;
        UUID id = (UUID)obj;
        return mostSigBits  == id.mostSigBits &&
               leastSigBits == id.leastSigBits;
    }

Perhaps better still is:

    public boolean equals(Object obj) {
        if (!(obj instanceof UUID))
            return false;
        UUID id = (UUID)obj;
        return mostSigBits  == id.mostSigBits &
               leastSigBits == id.leastSigBits;
    }

The latter is branch-free, and the branch in the former is unpredictable. Admittely the former is more idiomatic, but it may be time to change the idiom.

To summarize, the caching for version, variant, timestamp, sequence, node, and hashCode is broken.  The two obvious choices to fix it are make these fields final (set upon object creation), and to eliminate them entirely and recalculate the values each time they're requested. The former is likely to have marginally better time performance, and the latter better space performance. The former will require some effort due to deficiencies in the serializatin mechanism. Finally, the equals method is unduly complex, and slower than necessary.

Comments
SUGGESTED FIX webrev posted http://cr.openjdk.java.net/~mduigou/6611830/webrev.0/webrev/
23-02-2011

EVALUATION While it is true that uninitialized fields of a Java object can be observed in another thread when no synchronization is used, according to the Java Memory Model, in practice I have never seen this happen on platforms supported by Sun. Fields appear to be written before the reference holding the return value from the new operator in all threads. This bug pattern is endemic in the JDK sources. I agree that we should consider not caching the variant field.
01-10-2007