FULL PRODUCT VERSION : java version "1.6.0_11" Java(TM) SE Runtime Environment (build 1.6.0_11-b03) Java HotSpot(TM) Client VM (build 11.0-b16, mixed mode, sharing) ADDITIONAL OS VERSION INFORMATION : Windows XP SR-2 A DESCRIPTION OF THE PROBLEM : This is a MIXED list of BUGs and RFEs regarding performance of sun.nio.cs.FastCharsetProvider ! Bug 1: There is a shortcut to enhance VM startup in line 94: if (cln.equals("US_ASCII")) { cs = new US_ASCII(); ... As it seems that current charset for startup is UTF-8, the shortcut should be updated to UTF-8. Also check this in sun.io.Converters, where ISO-8859-1 is yet defined for defaultEncoding. Bug 2: Method canonicalize() is invoked 2 times in sequence after invoking charsetForName() (lines 119 -> 82). This unnecessarily consumes performance. Bug 3: Method lookup() should be synchronized to insure lock also for invokation from charsets() iteration; synchronization could be saved for method charsetForName(). RFE 4: After unsuccessful lookup in cache, there is a 2nd lookup in classMap to check support for given charset name: // Check cache first Charset cs = cache.get(csn); if (cs != null) return cs; // Do we even support this charset? String cln = classMap.get(csn); if (cln == null) return null; This could be enhanced by initializing the cache with a NULL Charset object. So there is only need for 1 single map lookup: Charset cs = cache.get(csn); if (cs == null) // Don't we even support this charset? return null; if (cs != NULL) return cs; so ... RFE 5: Class name lookup can be done after startup charset check, if latter is done by canonical name comparison: if (csn.equals("us-ascii")) RFE 6: In method toLower() s shouldn't be looped twice after 'toLower' becomes false, and copying of lower chars could be saved: (copying String's internal char[] shouldn't be slower than new char[], as it also needs to fill initial content with 0's) private static String toLower(final String s) { boolean allLower = true; char[] ca = null; for (int i=0; i<s.length(); i++) { int c = s.charAt(i); if (((c - 'A') | ('Z' - c)) >= 0) { if (allLower) { ca = s.toCharArray(); allLower = false; } ca[i] += '\u0020'; } } return allLower ? s : new String(ca); } RFE 7: charsets() iterator must not canonicalize each charset name, so canonicalizing can be moved from lookup() to charsetForName(). See my correction here: https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/src/sun/nio/cs/FastCharsetProvider.java?diff_format=s&rev=552&view=markup RFE 8: classMap could be saved. Instead pre-load the cache initially with the class names. This would save 1 HashMap lookup to get the class name of a charset to be instantiated: In constructor do ... cache = classNames; // type: Map<String,Object> RFE 9: Pre-load the VM's startup charset. This would save the check for each lookup: private final static Charset STARTUP_CHARSET = new UTF_8(); cache.put(toLower(STARTUP_CHARSET.name()), STARTUP_CHARSET); All this would result in very simple lookup: private synchronized Charset lookup(String lowCanonical) { // Check cache first Object o = cache.get(lowCanonical); // TODO: maybe also cache by arbitrary alias if (o instanceof String) { // Instantiate new charset Charset cs = newCharset((String)o); // Cache it if (cs != null) cache.put(lowCanonical, cs); return cs; } return (Charset)o; } Charset newCharset(String className) { try { Class c = Class.forName(packagePrefix+"."+className, true, getClass().getClassLoader()); return (Charset)c.newInstance(); } catch (ClassNotFoundException x) {} catch (IllegalAccessException x) {} catch (InstantiationException x) {} return null; } See my correction here: https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/src/sun/nio/cs/FastCharsetProvider.java?diff_format=s&rev=555&view=markup REPRODUCIBILITY : This bug can be reproduced always. ---------- BEGIN SOURCE ---------- For complete source code see: https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/src/sun/nio/cs/FastCharsetProvider.java?diff_format=s&rev=551&view=markup ---------- END SOURCE ---------- Comment copied from bugzilla report http://bugs.openjdk.java.net/show_bug.cgi?id=100092#c0 Description From UlfZibis 2009-07-19 12:12:28 PDT Created an attachment (id=114) [details] webrev including patch Maybe following would be suitable for the mercurial changeset summary: 6790402 Speed-up FastCharsetProvider Summary: Charset for startup should be UTF-8; Invoke canonicalize() only once after invoking charsetForName(); Method lookup() should be synchronized; Enhance lookup by initializing cache with a NULL Charset object; Class name lookup can be done after startup charset check, if is done by canonical name comparison; Method toUpper() would be potentially faster than toLower(); charsets() iterator must not canonicalize each charset name; classMap could be saved; Pre-load the VM's startup charset; Moving file standard-charsets to more appropriate location; Prepare AbstractCharsetProvider, ExtendedCharsets for better performance and flexibility; Contributed-by: Ulf.Zibis at CoSoCo.de
|