JDK-5022385 : (str) String.hashCode() performance improvement
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 5.0
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • OS: windows_2000,windows_xp
  • CPU: x86
  • Submitted: 2004-03-29
  • Updated: 2011-09-09
  • Resolved: 2011-09-09
Related Reports
Duplicate :  
Description
Name: gm110360			Date: 03/29/2004


A DESCRIPTION OF THE REQUEST :
Attached is a program that demonstrates a suggested small code change that speeds up the String.hashCode method by as much as 20% for some cases.

The program shows a few minor tweaks to the "String.hashCode" method that improve performance while retaining interface contract. The improvement is significant enough that you might want to consider replacing the current algorithm.

In a nutshell:
  - For non-empty strings
    - Hash calculation is 6% to 20% faster, depending on string length.
    - Cached hash retrieval is about the same.
  - For empty strings
    - Hash calculation is 28% faster.
    - Cached hash retrieval is 34% faster.

The main benefactors or this improvement are programs that create hash tables containing a large amount of non-literal string data, such as compilers and data-mining programs, which could see a 10% improvement in their table-building phases.

These test were run on a 450MHz Pentium 2 Dell computer running Win2000 using J2SDK 1.4.2_03. Under 1.5.0 beta 1, results had the same relationships but were a bit slower all-around. Also, results were about the same under Linux.

Performance test results:

  Hash computation (milliseconds for 10M trials):

    Trial #1:
    empty old: 640 (0)			New is 27.7% faster
    empty new: 501 (0)
    1-char old: 962 (97)		New is 5.6% faster
    1-char new: 911 (97)
    10-char old: 3595 (-634317659)	New is 13.2% faster
    10-char new: 3175 (-634317659)
    100-char old: 28040 (-1399078606)	New is 15.6% faster
    100-char new: 24255 (-1399078606)

    Trial #2:
    0 old: 641 (0)			New is 28.2% faster
    0 new: 500 (0)
    1 old: 962 (97)			New is 6.8% faster
    1 new: 901 (97)
    10 old: 3595 (-634317659)		New is 13.6% faster
    10 new: 3165 (-634317659)
    100 old: 28050 (-1399078606)	New is 18.8% faster
    100 new: 23604 (-1399078606)
    1000 old: 272481 (1365024500)	New is 19.9% faster
    1000 new: 227297 (1365024500)

  Hash retrieval (milliseconds for 100M trials):
    (Shows that empty strings retrieve hash faster, others about the same.)

    empty old: 6590 (0)			New is 34.2% faster
    empty new: 4907 (0)
    1-char old: 5838 (97)		< 1% difference
    1-char new: 5819 (97)
    10-char old: 5808 (-634317659)	< 1% difference
    10-char new: 5818 (-634317659)
    100-char old: 5819 (-1399078606)	< 1% difference
    100-char new: 5818 (-1399078606)
    1000-char old: 5818 (1365024500)	< 1% difference
    1000-char new: 5809 (1365024500)


JUSTIFICATION :
Faster is better.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
A speedup in some programs.

---------- BEGIN SOURCE ----------
/*

StingHashAlgorithm.java

This program shows a few minor tweaks to the "String.hashCode" method that
improve performance while retaining interface contract. The improvement is
significant enough that you might want to consider replacing the current
algorithm.

In a nutshell:
  - For non-empty strings
    - Hash calculation is 6% to 20% faster, depending on string length.
    - Cached hash retrieval is about the same.
  - For empty strings
    - Hash calculation is 28% faster.
    - Cached hash retrieval is 34% faster.

The main benefactors or this improvement are programs that create hash
tables containing a large amount of non-literal string data, such as
compilers and data-mining programs, which could see a 10% improvement in
their table-building phases.

These test were run on a 450MHz Pentium 2 Dell computer running Win2000 using
J2SDK 1.4.2_03. Under 1.5.0 beta 1, results had the same relationships but
were a bit slower all-around. Also, results were about the same under Linux.

Performance test results:

  Hash computation (milliseconds for 10M trials):

    Trial #1:
    empty old: 640 (0)			New is 27.7% faster
    empty new: 501 (0)
    1-char old: 962 (97)		New is 5.6% faster
    1-char new: 911 (97)
    10-char old: 3595 (-634317659)	New is 13.2% faster
    10-char new: 3175 (-634317659)
    100-char old: 28040 (-1399078606)	New is 15.6% faster
    100-char new: 24255 (-1399078606)

    Trial #2:
    0 old: 641 (0)			New is 28.2% faster
    0 new: 500 (0)
    1 old: 962 (97)			New is 6.8% faster
    1 new: 901 (97)
    10 old: 3595 (-634317659)		New is 13.6% faster
    10 new: 3165 (-634317659)
    100 old: 28050 (-1399078606)	New is 18.8% faster
    100 new: 23604 (-1399078606)
    1000 old: 272481 (1365024500)	New is 19.9% faster
    1000 new: 227297 (1365024500)

  Hash retrieval (milliseconds for 100M trials):
    (Shows that empty strings retrieve hash faster, others about the same.)

    empty old: 6590 (0)			New is 34.2% faster
    empty new: 4907 (0)
    1-char old: 5838 (97)		< 1% difference
    1-char new: 5819 (97)
    10-char old: 5808 (-634317659)	< 1% difference
    10-char new: 5818 (-634317659)
    100-char old: 5819 (-1399078606)	< 1% difference
    100-char new: 5818 (-1399078606)
    1000-char old: 5818 (1365024500)	< 1% difference
    1000-char new: 5809 (1365024500)

*/

public class StringHashAlgorithm {

    //
    // Test parameters: Set to true to test hash computation, false to
    // test hash retrieval (cached). The "production" setting is false.
    //
    private static final boolean ALWAYS_COMPUTE = true; // Normal is false

    private static final int reps = ALWAYS_COMPUTE ? 10000000 : 100000000;

    private interface QuasiString {
	int hashCode();
    }

    private static final class NewQuasiString implements QuasiString {

	private int hash = 0;
	private int count;
	private int offset = 0;
	private char[] value;
	public NewQuasiString(String s) {
	    value = s.toCharArray();
	    count = value.length;
	}
	//
	// Proposed new hashCode technique. Parameter for the "real", non-test
	// version is ALWAYS_COMPUTE = false.
	//
	public int hashCode() {
	    int h = hash;
	    if (h != 0)
		return h;
	    int len = count;
	    if (len == 0)
		return 0;
	    int off = offset;
	    char val[] = value;

	    len += off;
	    while (off < len)
		h = 31 * h + val[off++];
	    if (ALWAYS_COMPUTE) {
		return h = h;
	    } else {
		return hash = h;
	    }
	}
    }

    private static final class OldQuasiString implements QuasiString {
	private int hash = 0;
	private int count;
	private int offset = 0;
	private char[] value;
	public OldQuasiString(String s) {
	    value = s.toCharArray();
	    count = value.length;
	}
	//
	// Current hashCode technique.
	//
	public int hashCode() {
	    int h = hash;
	    if (h == 0) {
		int off = offset;
		char val[] = value;
		int len = count;

		for (int i = 0; i < len; i++) {
		    h = 31*h + val[off++];
		}
		if (ALWAYS_COMPUTE)
		    h = h;
		else
		    hash = h;
	    }
	    return h;
	}
    }

    public static void main(String[] args) {
	String chars10 = "abcdefghij";
	String chars100 = chars10 + chars10 + chars10 + chars10 +
		chars10 + chars10 + chars10 + chars10 + chars10 + chars10;
	String chars1000 = chars100 + chars100 + chars100 + chars100 +
		chars100 + chars100 + chars100 + chars100 + chars100 + chars100;
	test("0 old", new OldQuasiString(""));
	test("0 new", new NewQuasiString(""));
	test("1 old", new OldQuasiString("a"));
	test("1 new", new NewQuasiString("a"));
	test("10 old", new OldQuasiString(chars10));
	test("10 new", new NewQuasiString(chars10));
	test("100 old", new OldQuasiString(chars100));
	test("100 new", new NewQuasiString(chars100));
	test("1000 old", new OldQuasiString(chars1000));
	test("1000 new", new NewQuasiString(chars1000));
    }

    private static void test(String msg, QuasiString s) {
	long t0 = System.currentTimeMillis();
	for (int i = reps; --i >= 0; ) {
	    s.hashCode();
	}
	long t = System.currentTimeMillis() - t0;
	System.out.println(msg + ": " + t + " (" + s.hashCode() + ")");
    }

}
---------- END SOURCE ----------
(Incident Review ID: 245459) 
======================================================================