Duplicate :
|
Name: gm110360 Date: 03/29/2004 A DESCRIPTION OF THE REQUEST : Attached is a program that demonstrates a suggested small code change that speeds up the String.hashCode method by as much as 20% for some cases. The program shows a few minor tweaks to the "String.hashCode" method that improve performance while retaining interface contract. The improvement is significant enough that you might want to consider replacing the current algorithm. In a nutshell: - For non-empty strings - Hash calculation is 6% to 20% faster, depending on string length. - Cached hash retrieval is about the same. - For empty strings - Hash calculation is 28% faster. - Cached hash retrieval is 34% faster. The main benefactors or this improvement are programs that create hash tables containing a large amount of non-literal string data, such as compilers and data-mining programs, which could see a 10% improvement in their table-building phases. These test were run on a 450MHz Pentium 2 Dell computer running Win2000 using J2SDK 1.4.2_03. Under 1.5.0 beta 1, results had the same relationships but were a bit slower all-around. Also, results were about the same under Linux. Performance test results: Hash computation (milliseconds for 10M trials): Trial #1: empty old: 640 (0) New is 27.7% faster empty new: 501 (0) 1-char old: 962 (97) New is 5.6% faster 1-char new: 911 (97) 10-char old: 3595 (-634317659) New is 13.2% faster 10-char new: 3175 (-634317659) 100-char old: 28040 (-1399078606) New is 15.6% faster 100-char new: 24255 (-1399078606) Trial #2: 0 old: 641 (0) New is 28.2% faster 0 new: 500 (0) 1 old: 962 (97) New is 6.8% faster 1 new: 901 (97) 10 old: 3595 (-634317659) New is 13.6% faster 10 new: 3165 (-634317659) 100 old: 28050 (-1399078606) New is 18.8% faster 100 new: 23604 (-1399078606) 1000 old: 272481 (1365024500) New is 19.9% faster 1000 new: 227297 (1365024500) Hash retrieval (milliseconds for 100M trials): (Shows that empty strings retrieve hash faster, others about the same.) empty old: 6590 (0) New is 34.2% faster empty new: 4907 (0) 1-char old: 5838 (97) < 1% difference 1-char new: 5819 (97) 10-char old: 5808 (-634317659) < 1% difference 10-char new: 5818 (-634317659) 100-char old: 5819 (-1399078606) < 1% difference 100-char new: 5818 (-1399078606) 1000-char old: 5818 (1365024500) < 1% difference 1000-char new: 5809 (1365024500) JUSTIFICATION : Faster is better. EXPECTED VERSUS ACTUAL BEHAVIOR : EXPECTED - A speedup in some programs. ---------- BEGIN SOURCE ---------- /* StingHashAlgorithm.java This program shows a few minor tweaks to the "String.hashCode" method that improve performance while retaining interface contract. The improvement is significant enough that you might want to consider replacing the current algorithm. In a nutshell: - For non-empty strings - Hash calculation is 6% to 20% faster, depending on string length. - Cached hash retrieval is about the same. - For empty strings - Hash calculation is 28% faster. - Cached hash retrieval is 34% faster. The main benefactors or this improvement are programs that create hash tables containing a large amount of non-literal string data, such as compilers and data-mining programs, which could see a 10% improvement in their table-building phases. These test were run on a 450MHz Pentium 2 Dell computer running Win2000 using J2SDK 1.4.2_03. Under 1.5.0 beta 1, results had the same relationships but were a bit slower all-around. Also, results were about the same under Linux. Performance test results: Hash computation (milliseconds for 10M trials): Trial #1: empty old: 640 (0) New is 27.7% faster empty new: 501 (0) 1-char old: 962 (97) New is 5.6% faster 1-char new: 911 (97) 10-char old: 3595 (-634317659) New is 13.2% faster 10-char new: 3175 (-634317659) 100-char old: 28040 (-1399078606) New is 15.6% faster 100-char new: 24255 (-1399078606) Trial #2: 0 old: 641 (0) New is 28.2% faster 0 new: 500 (0) 1 old: 962 (97) New is 6.8% faster 1 new: 901 (97) 10 old: 3595 (-634317659) New is 13.6% faster 10 new: 3165 (-634317659) 100 old: 28050 (-1399078606) New is 18.8% faster 100 new: 23604 (-1399078606) 1000 old: 272481 (1365024500) New is 19.9% faster 1000 new: 227297 (1365024500) Hash retrieval (milliseconds for 100M trials): (Shows that empty strings retrieve hash faster, others about the same.) empty old: 6590 (0) New is 34.2% faster empty new: 4907 (0) 1-char old: 5838 (97) < 1% difference 1-char new: 5819 (97) 10-char old: 5808 (-634317659) < 1% difference 10-char new: 5818 (-634317659) 100-char old: 5819 (-1399078606) < 1% difference 100-char new: 5818 (-1399078606) 1000-char old: 5818 (1365024500) < 1% difference 1000-char new: 5809 (1365024500) */ public class StringHashAlgorithm { // // Test parameters: Set to true to test hash computation, false to // test hash retrieval (cached). The "production" setting is false. // private static final boolean ALWAYS_COMPUTE = true; // Normal is false private static final int reps = ALWAYS_COMPUTE ? 10000000 : 100000000; private interface QuasiString { int hashCode(); } private static final class NewQuasiString implements QuasiString { private int hash = 0; private int count; private int offset = 0; private char[] value; public NewQuasiString(String s) { value = s.toCharArray(); count = value.length; } // // Proposed new hashCode technique. Parameter for the "real", non-test // version is ALWAYS_COMPUTE = false. // public int hashCode() { int h = hash; if (h != 0) return h; int len = count; if (len == 0) return 0; int off = offset; char val[] = value; len += off; while (off < len) h = 31 * h + val[off++]; if (ALWAYS_COMPUTE) { return h = h; } else { return hash = h; } } } private static final class OldQuasiString implements QuasiString { private int hash = 0; private int count; private int offset = 0; private char[] value; public OldQuasiString(String s) { value = s.toCharArray(); count = value.length; } // // Current hashCode technique. // public int hashCode() { int h = hash; if (h == 0) { int off = offset; char val[] = value; int len = count; for (int i = 0; i < len; i++) { h = 31*h + val[off++]; } if (ALWAYS_COMPUTE) h = h; else hash = h; } return h; } } public static void main(String[] args) { String chars10 = "abcdefghij"; String chars100 = chars10 + chars10 + chars10 + chars10 + chars10 + chars10 + chars10 + chars10 + chars10 + chars10; String chars1000 = chars100 + chars100 + chars100 + chars100 + chars100 + chars100 + chars100 + chars100 + chars100 + chars100; test("0 old", new OldQuasiString("")); test("0 new", new NewQuasiString("")); test("1 old", new OldQuasiString("a")); test("1 new", new NewQuasiString("a")); test("10 old", new OldQuasiString(chars10)); test("10 new", new NewQuasiString(chars10)); test("100 old", new OldQuasiString(chars100)); test("100 new", new NewQuasiString(chars100)); test("1000 old", new OldQuasiString(chars1000)); test("1000 new", new NewQuasiString(chars1000)); } private static void test(String msg, QuasiString s) { long t0 = System.currentTimeMillis(); for (int i = reps; --i >= 0; ) { s.hashCode(); } long t = System.currentTimeMillis() - t0; System.out.println(msg + ": " + t + " (" + s.hashCode() + ")"); } } ---------- END SOURCE ---------- (Incident Review ID: 245459) ======================================================================