JDK-5002890 : (cs) Charset.isSupported is slow when invoked for different charsets
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Affected Version: 1.4.2_03
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: windows_xp
  • CPU: generic,x86
  • Submitted: 2004-02-25
  • Updated: 2017-05-19
  • Resolved: 2004-09-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other JDK 6
1.4.2_07 b01Fixed 6Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Description
The performance becomes too slow when isSupported runs for several charsets.

The isSupported might not be used directly by java users.
However, it is called when OutputStreamWriter or Reader is  constructed.
(which is, "new" operation applies to them)

Those Stream classes is well-used and charset switch will well-occurs
in 2 bytes locale environment.
So, the performance deterioration is too big.


REPRODUCE :
  (1) Compile the attached program
  (2) Launch "java isSupportedTest"
     ==> You will see the list in following  "BEHAVIOR" .

BEHAVIOR:
   
  The test program is lauching isSupported as follows.
    case1: to warm up compiled code and 
           invoke isSupported "ISO-8859-1"
    case2: to warm up compiled code and 
           invoke isSupported "UTF-8"
    case3: to invoke isSupported "ISO-8859-1" and "UTF-8"

The performance in case3 is worse than the others.
Please see below. Unit is mili second.

=======
K:\nio-perf\isSupport>java isSupportedTest
case1:60
case2:70
case3:6099

K:\nio-perf\isSupport>java -version
java version "1.4.2_03"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)

=======

CONFIGURATION : 
  OS : windowsXP (SP1, Japanese)
  JRE : 1.4.2_03
  


================================================================================

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: 1.4.2_07 1.5.0_01 mustang FIXED IN: 1.4.2_07 1.5.0_01 mustang INTEGRATED IN: 1.4.2_07 1.5.0_01 mustang
02-10-2004

PUBLIC COMMENTS -
02-10-2004

EVALUATION This problem could easily be addressed by caching the last few looked-up charsets. -- ###@###.### 2004/4/25 A very interesting story wrt. performance. To my great surprise, I discovered that the penalty for a cache miss, i.e. creating a new Charset object, is *much* lower after 1.4.2_05. Consider this program: ---------------------------------------------------------------------- import java.nio.charset.*; import java.util.*; class t1 { static final List times = new ArrayList(); static void time(Runnable job) { long t1 = System.currentTimeMillis(); while (System.currentTimeMillis() - t1 < 2*1000) // warm up job.run(); System.gc(); System.gc(); try { Thread.sleep(100); } catch (Exception e) {} long t2 = System.currentTimeMillis(); job.run(); times.add(new Long(System.currentTimeMillis() - t2)); } public static void main(String[] args) { final int iterations = 10000; time(new Runnable() { public void run() { for(int i=0; i<iterations; i++) Charset.isSupported((i&1) == 0 ? "ISO-2022-JP" : "UTF-8");}}); for (Iterator it = times.iterator(); it.hasNext(); ) System.out.print(it.next() + " "); System.out.println(); } } ---------------------------------------------------------------------- When run against all the 1.4.2 update releases and 1.5.0, I get: 1.4.2_01 -server 4590 1.4.2_01 -client 5868 1.4.2_02 -server 4372 1.4.2_02 -client 6244 1.4.2_03 -server 4641 1.4.2_03 -client 5738 1.4.2_04 -server 4668 1.4.2_04 -client 6117 1.4.2_05 -server 30 1.4.2_05 -client 75 1.4.2_06 -server 30 1.4.2_06 -client 76 1.5.0 -server 23 1.5.0 -client 50 So the news is very very good. Charset creation is dramatically cheaper as of 1.4.2_05, and it's even better in 1.5.0. With this, there is much less need to cache Charsets, especially since the caching involves overhead of its own. Nevertheless, a usage pattern involving two Charsets is sufficiently common that caching is a good idea. Measurements indicate that the simplest possible increase in caching, i.e. adding another simple 1-element cache, is the best engineering decision. ###@###.### 2004-08-02 --------------------------------------------------------------- With the new suggested fix, the results of running the tests on a win2k machine using the default client jvm are: CharsetBench test case 1.4.2_06 687 687 67656 703 688 67672 1.4.2_06 (with fix) 672 672 1406 671 671 1390 isSupportedTest test case 1.4.2_06 78 64 8046 1.4.2_06 (with fix) 63 32 282 ###@###.### 2004-08-04
04-08-2004

SUGGESTED FIX ==== Provided suggested fix in Charset.java ===== *************** *** 27,34 **** import sun.misc.ServiceConfigurationError; import sun.nio.cs.StandardCharsets; import sun.nio.cs.ThreadLocalCoders; - /** * A named mapping between sequences of sixteen-bit Unicode characters and * sequences of bytes. This class defines methods for creating decoders and --- 27,34 ---- import sun.misc.ServiceConfigurationError; import sun.nio.cs.StandardCharsets; import sun.nio.cs.ThreadLocalCoders; + import java.util.Hashtable; /** * A named mapping between sequences of sixteen-bit Unicode characters and * sequences of bytes. This class defines methods for creating decoders and *************** *** 279,287 **** --- 279,290 ---- // along with the name that was used to find it // private static volatile Object[] cache = null; + private static Hashtable hash = new Hashtable(); private static Charset cache(String charsetName, Charset cs) { cache = new Object[] { charsetName, cs }; + + hash.put( charsetName, cs ); return cs; } *************** *** 373,385 **** } } private static Charset lookup(String charsetName) { if (charsetName == null) throw new IllegalArgumentException("Null charset name"); Object[] ca = cache; if ((ca != null) && ca[0].equals(charsetName)) return (Charset)ca[1]; ! Charset cs = standardProvider.charsetForName(charsetName); if (cs != null) return cache(charsetName, cs); cs = lookupViaProviders(charsetName); --- 376,395 ---- } } + + private static Charset lookup(String charsetName) { if (charsetName == null) throw new IllegalArgumentException("Null charset name"); + Object[] ca = cache; if ((ca != null) && ca[0].equals(charsetName)) return (Charset)ca[1]; ! ! Charset cs = (Charset)hash.get(charsetName); ! if ( cs != null ) return cs; ! ! cs = standardProvider.charsetForName(charsetName); if (cs != null) return cache(charsetName, cs); cs = lookupViaProviders(charsetName); ###@###.### 2004-02-25 ============================================================================ Here is my currently recommended fix for 1.4.2_06, which also works for 1.5: --- /tmp/geta12651 2004-08-02 18:41:03.992978000 -0700 +++ Charset.java 2004-08-02 13:22:40.782452000 -0700 @@ -271,22 +271,23 @@ throw new IllegalCharsetNameException(s); } } /* The standard set of charsets */ private static CharsetProvider standardProvider = new StandardCharsets(); - // Cache of the most-recently-returned charset, - // along with the name that was used to find it + // Cache of the most-recently-returned charsets, + // along with the names that were used to find them // - private static volatile Object[] cache = null; + private static volatile Object[] cache1 = null; // "Level 1" cache + private static volatile Object[] cache2 = null; // "Level 2" cache - private static Charset cache(String charsetName, Charset cs) { - cache = new Object[] { charsetName, cs }; - return cs; + private static void cache(String charsetName, Charset cs) { + cache2 = cache1; + cache1 = new Object[] { charsetName, cs }; } // Creates an iterator that walks over the available providers, ignoring // those whose lookup or instantiation causes a security exception to be // thrown. Should be invoked with full privileges. // private static Iterator providers() { @@ -410,26 +411,40 @@ } return (ecp != null) ? ecp.charsetForName(charsetName) : null; } private static Charset lookup(String charsetName) { if (charsetName == null) throw new IllegalArgumentException("Null charset name"); - Object[] ca = cache; - if ((ca != null) && ca[0].equals(charsetName)) - return (Charset)ca[1]; - Charset cs = standardProvider.charsetForName(charsetName); - if (cs != null) - return cache(charsetName, cs); - cs = lookupExtendedCharset(charsetName); - if (cs != null) - return cache(charsetName, cs); - cs = lookupViaProviders(charsetName); - if (cs != null) - return cache(charsetName, cs); + + Object[] a; + if ((a = cache1) != null && charsetName.equals(a[0])) + return (Charset)a[1]; + // We expect most programs to use one Charset repeatedly. + // We convey a hint to this effect to the VM by putting the + // level 1 cache miss code in a separate method. + return lookup2(charsetName); + } + + private static Charset lookup2(String charsetName) { + Object[] a; + if ((a = cache2) != null && charsetName.equals(a[0])) { + cache2 = cache1; + cache1 = a; + return (Charset)a[1]; + } + + Charset cs; + if ((cs = standardProvider.charsetForName(charsetName)) != null || + (cs = lookupExtendedCharset(charsetName)) != null || + (cs = lookupViaProviders(charsetName)) != null) { + cache(charsetName, cs); + return cs; + } + /* Only need to check the name if we didn't find a charset for it */ checkName(charsetName); return null; } /** * Tells whether the named charset is supported. </p> ---------------------------------------------------------------------- Here is a benchmark program to test the above fix: ---------------------------------------------------------------------- import java.nio.charset.*; import java.util.*; class CharsetBench { static final List times = new ArrayList(); static void time(Runnable job) { long t1 = System.currentTimeMillis(); while (System.currentTimeMillis() - t1 < 10*1000) // warm up job.run(); System.gc(); System.gc(); try { Thread.sleep(100); } catch (Exception e) {} long t2 = System.currentTimeMillis(); job.run(); times.add(new Long(System.currentTimeMillis() - t2)); } public static void main(String[] args) { final int iterations = 10000000; for (int j = 0; j < 2; j++) { time(new Runnable() { public void run() { for(int i=0; i<iterations; i++) Charset.isSupported("ISO-2022-JP");}}); time(new Runnable() { public void run() { for(int i=0; i<iterations; i++) Charset.isSupported("UTF-8");}}); time(new Runnable() { public void run() { for(int i=0; i<iterations; i++) Charset.isSupported((i&1) == 0 ? "ISO-2022-JP" : "UTF-8");}}); } for (Iterator it = times.iterator(); it.hasNext(); ) System.out.print(it.next() + " "); System.out.println(); } } ---------------------------------------------------------------------- With this program, I get the following results: ---------------------------------------------------------------------- javac CharsetBench.java && for w in cc1; do for f in -server -client; do echo $w $f `jws $w java $f CharsetBench`; done; done; for v in 1.5 1.4.2_06 1.4.2; do for f in -server -client; do echo $v $f `jver $v java $f CharsetBench`; done; done cc1 -server 74 92 413 92 92 409 cc1 -client 355 356 700 356 356 711 1.5 -server 98 116 12807 114 118 12623 1.5 -client 349 341 22302 349 349 22331 1.4.2_06 -server 98 116 17438 102 117 17617 1.4.2_06 -client 371 371 37553 359 371 37618 ---------------------------------------------------------------------- (where cc1 contains the results with the proposed fix applied to 1.5). What these results show is that the proposed fix makes both the repeated-single-Charset and the alternating-Charset access pattern faster, the second by an more than an order of magnitude. Looks good enough to me. Time to stop tweaking. ###@###.### 2004-08-02
02-08-2004

WORK AROUND If the user is directly calling Charset.isSupported, the answers can be cached, for example in a hashtable. ###@###.### 2004-07-29
29-07-2004