United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6497138 Scalability limitation in ConcurrentHashMap on Java 6
JDK-6497138 : Scalability limitation in ConcurrentHashMap on Java 6

Details
Type:
Bug
Submit Date:
2006-11-24
Status:
Closed
Updated Date:
2011-05-18
Project Name:
JDK
Resolved Date:
2011-05-18
Component:
core-libs
OS:
linux
Sub-Component:
java.util.concurrent
CPU:
x86
Priority:
P3
Resolution:
Fixed
Affected Versions:
6
Fixed Versions:

Related Reports
Backport:

Sub Tasks

Description
FULL PRODUCT VERSION :
java version "1.6.0-rc"
Java(TM) SE Runtime Environment (build 1.6.0-rc-b104)
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0-rc-b104, mixed mode)


ADDITIONAL OS VERSION INFORMATION :
Linux RHEL 4

A DESCRIPTION OF THE PROBLEM :
Performance of ConcurrentHashMap when running a simple test is poor on java 6 compared to java 5.

The basic problem appears to be with the "supplemental" hashing algorithm inside the ConcurrentHashMap class in java 6. When applied to the hashCode of relatively small integers (which is just the value) this returns, well.... basically the original value.

This is exacerbated by the algorithm used to determine the segment in which to store the integer. Because for small integers so many of the top bits in the "supplemented" hash continue to be zero, all of the keys are mapped into a single segment and you get horrible contention which then explains the performance degradation.


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Compile and run the attached source. I ran it as

java -Xmx6g -Xms6g -verbose:gc CHM

on a 64-bit JVM.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
On 1.5.0_09 this gave:

jdk1.5.0_09/bin/java -Xmx6g -Xms6g -verbose:gc CHM
[GC 1572864K->831061K(6029312K), 2.7360780 secs]
[GC 2403925K->1316653K(6029312K), 1.6569750 secs]
[GC 2889517K->1900751K(6029312K), 1.6231660 secs]
[GC 3473615K->2247047K(6029312K), 2.4905310 secs]
[GC 3819911K->2588263K(6029312K), 2.2804910 secs]
[GC 4161127K->2922287K(5155584K), 2.2536550 secs]
[GC 3621423K->3086143K(5592448K), 1.2750620 secs]
[GC 3785279K->3220407K(5592448K), 1.5142950 secs]
[GC 3919543K->3335087K(5592448K), 1.6740890 secs]
[GC 4034223K->3443063K(5592448K), 1.7367300 secs]
[GC 4142199K->3574359K(5592448K), 1.4082950 secs]
[GC 4273495K->3735167K(5592448K), 1.0626740 secs]
[Full GC 3735167K->2106499K(5592448K), 16.0496230 secs]
Elapsed time = 64038 msecs

ACTUAL -
On jdk 6 it java:

jdk1.6.0/bin/java -Xmx6g -Xms6g -verbose:gc CHM 1
[GC 1572864K->828413K(6029312K), 2.7983680 secs]
[GC 2401277K->1334757K(6029312K), 2.9499190 secs]
[GC 2907621K->1945757K(6029312K), 2.6088650 secs]
[GC 3518621K->2307797K(6029312K), 2.8401080 secs]
[GC 3880661K->2638397K(6029312K), 2.3686200 secs]
[GC 4211261K->2955357K(5155584K), 2.2537560 secs]
[GC 3654493K->3110077K(5592448K), 1.3336820 secs]
[GC 3809213K->3234677K(5592448K), 1.6970660 secs]
[GC 3933813K->3338621K(5592448K), 1.7949220 secs]
[GC 4037757K->3448277K(5592448K), 1.8651310 secs]
[GC 4147413K->3582725K(5592448K), 1.7257750 secs]
[GC 4281861K->3746469K(5592448K), 1.3348600 secs]
[Full GC 3746469K->2204555K(5592448K), 15.2973700 secs]
Elapsed time = 100252 msecs


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.util.*;
import java.util.concurrent.*;

public class CHM extends Thread {
        static ConcurrentHashMap<Integer, String> chm;
        static Integer[] ints;
        int thrNum;

        CHM(int thrNum) {
                this.thrNum = thrNum;
        }

        public void run() {
                Random rnd = new Random();
                for (int i=0; i<10000000; i++) {
                        Integer key = ints[rnd.nextInt(10000000)];
                        String value = chm.get(key);
                        if (value != null) {
                                chm.replace(key, value, new String("This is thread #" + thrNum));
                        } else {
                                chm.putIfAbsent(key, new String("This is thread #" + thrNum));
                        }
                }
        }

        public static void main(String[] args) throws Exception {
                chm = new ConcurrentHashMap<Integer, String>(10000000, 0.75f, 4);
                CHM[] threads = new CHM[4];

                ints = new Integer[10000000];
                for (int i=0; i<10000000; i++) {
                        ints[i] = new Integer(i);
                }

                long startTime = System.currentTimeMillis();
                for (int i=0; i<4; i++) {
                        threads[i] = new CHM(i);
                        threads[i].start();
                }
                for (int i=0; i<4; i++) {
                        threads[i].join();
                }
                long endTime = System.currentTimeMillis();
                System.out.println("Elapsed time = " + (endTime-startTime) + " msecs");
        }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Use java 5 or modify the hashCode() in Integer to return more "bitty" than just the int value.

                                    

Comments
EVALUATION

For HashMap, it is best to squeeze adjacent keys into adjacent hash slots.
Perhaps it was a mistake to assume the same for ConcurrentHashMap?
                                     
2006-11-28
EVALUATION

The submitter writes:

----------------------------------------------------
I'm not sure you've grasped the point. The "supplemental hash" achieves exactly nothing for default concurrent hashmaps in java 6. Look at the code. It is a series of right logical shifts and xors. There is no way that this series of operations can in any way modify the top 4 bits of the original hash. Since these are the exact set of bits that are used to determine the segment, they are essentially just a way of burning CPU cycles.

If you replace the supplemental hash algorithm in java 6 with the algorithm in java 5 then the problem is (at least in this case) alleviated.

This is a case where concurrency that worked in java 5 doesn't work in java 6. Hashmap has nothing to do with it.
----------------------------------------------------
                                     
2006-12-01
EVALUATION

Here's a test program that sheds some light:

-----------------------------------------------
import java.util.*;
import java.util.concurrent.*;
import java.lang.reflect.*;

class SlowInteger {
    static void sleep(long millis) {
	try { Thread.currentThread().sleep(millis); }
	catch (Throwable t) { throw new Error(t); }
    }

    final static SlowInteger ZERO = new SlowInteger(0);

    final int i;
    SlowInteger(int i) { this.i = i; }
    public boolean equals(Object x) {
	sleep(100);		// very very sloooooow
	return x instanceof SlowInteger && ((SlowInteger)x).i == this.i;
    }
    public int hashCode() { return i; }
}

public class Bug4 {
    private static int intArg(String[] args, int i, int defaultValue) {
	return args.length > i ? Integer.parseInt(args[i]) : defaultValue;
    }

    static Object getField(Object x, String fieldName) {
	try {
	    for (Field field : x.getClass().getDeclaredFields())
		if (field.getName() == fieldName) {
		    field.setAccessible(true);
		    return field.get(x);
		}
	} catch (Throwable t) {}
	throw new Error(fieldName);
    }

    public static void main(String[] args) throws Exception {
	final Random rnd = new Random();
	final int max = intArg(args, 0, 1<<20);
	final int iterations = intArg(args, 1, 30);
	final int threadCount = 16;
	final ConcurrentHashMap<SlowInteger, Boolean> map
	    = new ConcurrentHashMap<SlowInteger, Boolean>(max);
	final List<Thread> threads = new ArrayList<Thread>(threadCount);

	long t0 = System.nanoTime();
	for (int i=0; i<threadCount; i++)
	    threads.add(new Thread() { public void run() {
		for (int i=0; i<iterations; i++) {
		    SlowInteger key = new SlowInteger(rnd.nextInt(max));
		    map.put(key, Boolean.TRUE);
		    map.put(key, Boolean.FALSE);
		}}});
	for (Thread thread : threads) thread.start();
	for (Thread thread : threads) thread.join();
	System.out.printf("Elapsed time = %.3f seconds%n",
			  (System.nanoTime()-t0)/(1000.0*1000.0*1000.0));

	Object[] segmentsArray = (Object[]) getField(map, "segments");
	List<Object> counts = new ArrayList<Object>();
	for (Object segment : segmentsArray)
	    counts.add(getField(segment, "count"));
	System.out.printf("segment counts: %s%n", counts);
    }
}
-----------------------------------------------
When we run 5.0u9 against 6, we see:

 $ for v in 5.0u9 6; do jver $v jr Bug4 1024000; done
==> javac -source 1.5 -Xlint:all Bug4.java
==> java -esa -ea Bug4 1024000
Elapsed time = 5.709 seconds
segment counts: [31, 24, 35, 27, 39, 27, 28, 30, 30, 28, 22, 33, 28, 33, 25, 40]
==> javac -source 1.6 -Xlint:all Bug4.java
==> java -esa -ea Bug4 1024000
Elapsed time = 52.777 seconds
segment counts: [480, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

which demonstrates weakness on the part of the new hash function.

BUT....

 $ for v in 5.0u9 6; do jver $v jr Bug4 1024; done
==> javac -source 1.5 -Xlint:all Bug4.java
==> java -esa -ea Bug4 1024
Elapsed time = 63.478 seconds
segment counts: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 382]
==> javac -source 1.6 -Xlint:all Bug4.java
==> java -esa -ea Bug4 1024
Elapsed time = 63.630 seconds
segment counts: [381, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

for smaller hash values the 5.0u9 implementation has the same problem.
We need to do better than simply revert to the 5.0u9 hash function.
                                     
2006-12-01
EVALUATION

ConcurrentHashMap's secondary hash function
has a more difficult task than HashMap's;
not only do we want different keys to not collide,
and we want sequential keys to be sequentially co-located in memory,
(for optimal single-threaded sequential processing of sequential keys)
but we also want keys to be distributed evenly across different segments,
(for optimal concurrent processing)
and these requirements for maximal performance are in conflict.
For ConcurrentHashMap we should probably optimize for concurrency.

Finding a good hash function here is still an interesting research problem.
                                     
2006-12-01
EVALUATION

Here's a test program that directly determines 
segment clustering:

/* Detect colocation of sequences of the form i*2^k, 1 < i < n
 * to the same segment.
 */

import java.util.*;
import java.util.concurrent.*;
import java.lang.reflect.*;

public class Bug5 {
    public static void main(String[] args) throws Exception {
	List<Integer> badShifts = new ArrayList<Integer>();
	for (int eltShift = 0; eltShift <= 26; eltShift++)
	    if (sameSegment(eltShift))
		badShifts.add(eltShift);
	if (! badShifts.isEmpty())
	    System.out.printf("Bad element shifts=%s%n", badShifts);
    }

    static Object getField(Object x, String fieldName) {
	try {
	    for (Field field : x.getClass().getDeclaredFields())
		if (field.getName() == fieldName) {
		    field.setAccessible(true);
		    return field.get(x);
		}
	} catch (Throwable t) {}
	throw new Error(fieldName);
    }

    static boolean sameSegment(int eltShift) {
	final ConcurrentHashMap<Integer, Boolean> map
	    = new ConcurrentHashMap<Integer, Boolean>();

	for (int i = 0; i < 64; i++)
	    map.put(i<<eltShift, Boolean.TRUE);

	int nonEmptySegments = 0;
	//System.out.println(((Object[]) getField(map, "segments")).length);
	for (Object segment : (Object[]) getField(map, "segments"))
	    if (((Integer)(getField(segment, "count"))) > 0)
		nonEmptySegments++;
	//System.out.println(nonEmptySegments);
	return nonEmptySegments < 2;
    }
}

---------------------

and here's a demonstration that things have gotten much worse in jdk6:

 $ for v in 5.0u9 6; do jver $v jr Bug5; done
==> javac -source 1.5 -Xlint:all Bug5.java
==> java -esa -ea Bug5
Bad element shifts=[0, 1, 2, 3, 4, 5, 6, 7, 8]
==> javac -source 1.6 -Xlint:all Bug5.java
==> java -esa -ea Bug5
Bad element shifts=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]
                                     
2006-12-01
EVALUATION

Here's a test program that shows that all keys with hashes < 1<<28
end up in segment zero:

import java.util.*;
import java.util.concurrent.*;
import java.lang.reflect.*;

public class Bug6 {
    public static void main(String[] args) throws Exception {
	List<Integer> badShifts = new ArrayList<Integer>();
	for (int eltShift = 0; eltShift <= 29; eltShift++)
	    if (segment0(eltShift))
		badShifts.add(eltShift);
	System.out.printf("Segment zero shifts=%s%n", badShifts);
    }

    static Object getField(Object x, String fieldName) {
	try {
	    for (Field field : x.getClass().getDeclaredFields())
		if (field.getName() == fieldName) {
		    field.setAccessible(true);
		    return field.get(x);
		}
	} catch (Throwable t) {}
	throw new Error(fieldName);
    }

    static boolean segment0(int eltShift) {
	final ConcurrentHashMap<Integer, Boolean> map
	    = new ConcurrentHashMap<Integer, Boolean>();

	map.put(1<<eltShift, Boolean.TRUE);

	return ((Integer)getField(((Object[]) getField(map, "segments"))[0], "count")) > 0;
    }
}

==> java -esa -ea Bug6
Segment zero shifts=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]
                                     
2006-12-01
EVALUATION

As yet another variant,
here's a test whether transposing a bit with its successor causes it to
land in a different segment:

import java.util.*;
import java.util.concurrent.*;
import java.lang.reflect.*;

public class Bug7 {
    public static void main(String[] args) throws Exception {
	List<Integer> badShifts = new ArrayList<Integer>();
	for (int eltShift = 0; eltShift <= 29; eltShift++)
	    if (sameSegment(eltShift))
		badShifts.add(eltShift);
	System.out.printf("Single segment shifts=%s%n", badShifts);
    }

    static Object getField(Object x, String fieldName) {
	try {
	    for (Field field : x.getClass().getDeclaredFields())
		if (field.getName() == fieldName) {
		    field.setAccessible(true);
		    return field.get(x);
		}
	} catch (Throwable t) {}
	throw new Error(fieldName);
    }

    static boolean sameSegment(int eltShift) {
	final ConcurrentHashMap<Integer, Boolean> map
	    = new ConcurrentHashMap<Integer, Boolean>();

	for (int i = 0; i < 2; i++)
	    map.put((1+i)<<eltShift, Boolean.TRUE);

	int nonEmptySegments = 0;
	for (Object x : (Object[]) getField(map, "segments"))
	    if (((Integer)getField(x, "count")) > 0)
		nonEmptySegments++;
	return nonEmptySegments == 1;
    }
}

 $ for v in 1.5 6; do jver $v jr Bug7; done
==> javac -source 1.5 -Xlint:all Bug7.java
==> java -esa -ea Bug7
Single segment shifts=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
==> javac -source 1.6 -Xlint:all Bug7.java
==> java -esa -ea Bug7
Single segment shifts=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]
                                     
2006-12-03
EVALUATION

The needs of HashMap and ConcurrentHashMap are different; 
they need different secondary hash functions.
Here's our proposed fix should not only rectify the regression in jdk6,
but also improve on the concurrency for jdk 5:

--- /u/martin/ws/dolphin/src/share/classes/java/util/concurrent/ConcurrentHashMap.java	2006-10-07 11:55:55.276270000 -0700
+++ /u/martin/ws/hash/src/share/classes/java/util/concurrent/ConcurrentHashMap.java	2006-12-03 18:34:01.079666000 -0800
@@ -151,14 +151,17 @@
      * defends against poor quality hash functions.  This is critical
      * because ConcurrentHashMap uses power-of-two length hash tables,
      * that otherwise encounter collisions for hashCodes that do not
-     * differ in lower bits.
+     * differ in lower or upper bits.
      */
     private static int hash(int h) {
-        // This function ensures that hashCodes that differ only by
-        // constant multiples at each bit position have a bounded
-        // number of collisions (approximately 8 at default load factor).
-        h ^= (h >>> 20) ^ (h >>> 12);
-        return h ^ (h >>> 7) ^ (h >>> 4);
+        // Spread bits to regularize both segment and index locations,
+        // using variant of single-word Wang/Jenkins hash.
+        h += (h <<  15) ^ 0xffffcd7d;
+        h ^= (h >>> 10);
+        h += (h <<   3);
+        h ^= (h >>>  6);
+        h += (h <<   2) + (h << 14);
+        return h ^ (h >>> 16);
     }
 
     /**
(mb29450@suttles) ~/ws/hash $
                                     
2006-12-04



Hardware and Software, Engineered to Work Together