United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-5002890 (cs) Charset.isSupported is slow when invoked for different charsets
JDK-5002890 : (cs) Charset.isSupported is slow when invoked for different charsets

Details
Type:
Bug
Submit Date:
2004-02-25
Status:
Resolved
Updated Date:
2004-12-13
Project Name:
JDK
Resolved Date:
2004-09-12
Component:
core-libs
OS:
windows_xp
Sub-Component:
java.nio.charsets
CPU:
x86,generic
Priority:
P3
Resolution:
Fixed
Affected Versions:
1.4.2_03
Fixed Versions:
1.4.2_07 (b01)

Related Reports
Backport:
Backport:
Duplicate:
Relates:
Relates:

Sub Tasks

Description
The performance becomes too slow when isSupported runs for several charsets.

The isSupported might not be used directly by java users.
However, it is called when OutputStreamWriter or Reader is  constructed.
(which is, "new" operation applies to them)

Those Stream classes is well-used and charset switch will well-occurs
in 2 bytes locale environment.
So, the performance deterioration is too big.


REPRODUCE :
  (1) Compile the attached program
  (2) Launch "java isSupportedTest"
     ==> You will see the list in following  "BEHAVIOR" .

BEHAVIOR:
   
  The test program is lauching isSupported as follows.
    case1: to warm up compiled code and 
           invoke isSupported "ISO-8859-1"
    case2: to warm up compiled code and 
           invoke isSupported "UTF-8"
    case3: to invoke isSupported "ISO-8859-1" and "UTF-8"

The performance in case3 is worse than the others.
Please see below. Unit is mili second.

=======
K:\nio-perf\isSupport>java isSupportedTest
case1:60
case2:70
case3:6099

K:\nio-perf\isSupport>java -version
java version "1.4.2_03"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)

=======

CONFIGURATION : 
  OS : windowsXP (SP1, Japanese)
  JRE : 1.4.2_03
  


================================================================================

                                    

Comments
WORK AROUND

If the user is directly calling Charset.isSupported, the answers can
be cached, for example in a hashtable.

###@###.### 2004-07-29
                                     
2004-07-29
SUGGESTED FIX

==== Provided suggested fix in Charset.java =====
***************
*** 27,34 ****
  import sun.misc.ServiceConfigurationError;
  import sun.nio.cs.StandardCharsets;
  import sun.nio.cs.ThreadLocalCoders;

-
  /**
   * A named mapping between sequences of sixteen-bit Unicode characters and
   * sequences of bytes.  This class defines methods for creating decoders and
--- 27,34 ----
  import sun.misc.ServiceConfigurationError;
  import sun.nio.cs.StandardCharsets;
  import sun.nio.cs.ThreadLocalCoders;
+ import java.util.Hashtable;

  /**
   * A named mapping between sequences of sixteen-bit Unicode characters and
   * sequences of bytes.  This class defines methods for creating decoders and
***************
*** 279,287 ****
--- 279,290 ----
      // along with the name that was used to find it
      //
      private static volatile Object[] cache = null;
+     private static Hashtable hash = new Hashtable();

      private static Charset cache(String charsetName, Charset cs) {
        cache = new Object[] { charsetName, cs };
+
+       hash.put( charsetName, cs );
        return cs;
      }

***************
*** 373,385 ****
        }
      }

      private static Charset lookup(String charsetName) {
        if (charsetName == null)
            throw new IllegalArgumentException("Null charset name");
        Object[] ca = cache;
        if ((ca != null) && ca[0].equals(charsetName))
            return (Charset)ca[1];
!       Charset cs = standardProvider.charsetForName(charsetName);
        if (cs != null)
            return cache(charsetName, cs);
        cs = lookupViaProviders(charsetName);
--- 376,395 ----
        }
      }

+
+
      private static Charset lookup(String charsetName) {
        if (charsetName == null)
            throw new IllegalArgumentException("Null charset name");
+
        Object[] ca = cache;
        if ((ca != null) && ca[0].equals(charsetName))
            return (Charset)ca[1];
!
!       Charset cs = (Charset)hash.get(charsetName);
!       if ( cs != null ) return cs;
!
!       cs = standardProvider.charsetForName(charsetName);
        if (cs != null)
            return cache(charsetName, cs);
        cs = lookupViaProviders(charsetName);


###@###.### 2004-02-25
============================================================================

Here is my currently recommended fix for 1.4.2_06, which also works for 1.5:

--- /tmp/geta12651	2004-08-02 18:41:03.992978000 -0700
+++ Charset.java	2004-08-02 13:22:40.782452000 -0700
@@ -271,22 +271,23 @@
 	    throw new IllegalCharsetNameException(s);
 	}
     }
 
     /* The standard set of charsets */
     private static CharsetProvider standardProvider = new StandardCharsets();
 
-    // Cache of the most-recently-returned charset,
-    // along with the name that was used to find it
+    // Cache of the most-recently-returned charsets,
+    // along with the names that were used to find them
     //
-    private static volatile Object[] cache = null;
+    private static volatile Object[] cache1 = null; // "Level 1" cache
+    private static volatile Object[] cache2 = null; // "Level 2" cache
 
-    private static Charset cache(String charsetName, Charset cs) {
-	cache = new Object[] { charsetName, cs };
-	return cs;
+    private static void cache(String charsetName, Charset cs) {
+	cache2 = cache1;
+	cache1 = new Object[] { charsetName, cs };
     }
 
     // Creates an iterator that walks over the available providers, ignoring
     // those whose lookup or instantiation causes a security exception to be
     // thrown.  Should be invoked with full privileges.
     //
     private static Iterator providers() {
@@ -410,26 +411,40 @@
       }
       return (ecp != null) ? ecp.charsetForName(charsetName) : null;
     }
 
     private static Charset lookup(String charsetName) {
 	if (charsetName == null)
 	    throw new IllegalArgumentException("Null charset name");
-	Object[] ca = cache;
-	if ((ca != null) && ca[0].equals(charsetName))
-	    return (Charset)ca[1];
-	Charset cs = standardProvider.charsetForName(charsetName);
-	if (cs != null)
-	    return cache(charsetName, cs);
-	cs = lookupExtendedCharset(charsetName);
-	if (cs != null)
-	    return cache(charsetName, cs);
-	cs = lookupViaProviders(charsetName);
-	if (cs != null)
-	    return cache(charsetName, cs);
+
+	Object[] a;
+	if ((a = cache1) != null && charsetName.equals(a[0]))
+	    return (Charset)a[1];
+	// We expect most programs to use one Charset repeatedly.
+	// We convey a hint to this effect to the VM by putting the
+	// level 1 cache miss code in a separate method.
+	return lookup2(charsetName);
+    }
+
+    private static Charset lookup2(String charsetName) {
+	Object[] a;
+	if ((a = cache2) != null && charsetName.equals(a[0])) {
+	    cache2 = cache1;
+	    cache1 = a;
+	    return (Charset)a[1];
+	}
+
+	Charset cs;
+	if ((cs = standardProvider.charsetForName(charsetName)) != null ||
+	    (cs = lookupExtendedCharset(charsetName))           != null ||
+	    (cs = lookupViaProviders(charsetName))              != null) {
+	    cache(charsetName, cs);
+	    return cs;
+	}
+
 	/* Only need to check the name if we didn't find a charset for it */
 	checkName(charsetName);
 	return null;
     }
 
     /**
      * Tells whether the named charset is supported. </p>
----------------------------------------------------------------------

Here is a benchmark program to test the above fix:
----------------------------------------------------------------------
import java.nio.charset.*;
import java.util.*;

class CharsetBench {
    static final List times = new ArrayList();
    
    static void time(Runnable job) {
	long t1 = System.currentTimeMillis();
	while (System.currentTimeMillis() - t1 < 10*1000) // warm up
	    job.run();
	System.gc();
	System.gc();
	try { Thread.sleep(100); } catch (Exception e) {}
	long t2 = System.currentTimeMillis();
	job.run();
	times.add(new Long(System.currentTimeMillis() - t2));
    }
    
    public static void main(String[] args) {
	final int iterations = 10000000;

	for (int j = 0; j < 2; j++) {
	    time(new Runnable() { public void run() {
		for(int i=0; i<iterations; i++)
		    Charset.isSupported("ISO-2022-JP");}});

	    time(new Runnable() { public void run() {
		for(int i=0; i<iterations; i++)
		    Charset.isSupported("UTF-8");}});

	    time(new Runnable() { public void run() {
		for(int i=0; i<iterations; i++)
		    Charset.isSupported((i&1) == 0 ? "ISO-2022-JP" :
					"UTF-8");}});
	}

	for (Iterator it = times.iterator(); it.hasNext(); )
	    System.out.print(it.next() + " ");
	System.out.println();
    }
}
----------------------------------------------------------------------

With this program, I get the following results:
----------------------------------------------------------------------
javac CharsetBench.java && for w in cc1; do for f in -server -client; do echo $w $f `jws $w java $f CharsetBench`; done; done; for v in 1.5 1.4.2_06 1.4.2; do for f in -server -client; do echo $v $f `jver $v java $f CharsetBench`; done; done

cc1 -server 74 92 413 92 92 409
cc1 -client 355 356 700 356 356 711
1.5 -server 98 116 12807 114 118 12623
1.5 -client 349 341 22302 349 349 22331
1.4.2_06 -server 98 116 17438 102 117 17617
1.4.2_06 -client 371 371 37553 359 371 37618
----------------------------------------------------------------------

(where cc1 contains the results with the proposed fix applied to 1.5).

What these results show is that the proposed fix makes both
the repeated-single-Charset and the alternating-Charset access
pattern faster, the second by an more than an order of magnitude.
Looks good enough to me.  Time to stop tweaking.

###@###.### 2004-08-02
                                     
2004-08-02
EVALUATION

This problem could easily be addressed by caching the last few looked-up
charsets.

-- ###@###.### 2004/4/25

A very interesting story wrt. performance.

To my great surprise, I discovered that the penalty for a cache miss,
i.e. creating a new Charset object, is *much* lower after 1.4.2_05.
Consider this program:

----------------------------------------------------------------------
import java.nio.charset.*;
import java.util.*;

class t1 {
    static final List times = new ArrayList();
    
    static void time(Runnable job) {
	long t1 = System.currentTimeMillis();
	while (System.currentTimeMillis() - t1 < 2*1000) // warm up
	    job.run();
	System.gc();
	System.gc();
	try { Thread.sleep(100); } catch (Exception e) {}
	long t2 = System.currentTimeMillis();
	job.run();
	times.add(new Long(System.currentTimeMillis() - t2));
    }
    
    public static void main(String[] args) {
	final int iterations = 10000;

	time(new Runnable() { public void run() {
	    for(int i=0; i<iterations; i++)
		Charset.isSupported((i&1) == 0 ? "ISO-2022-JP" :
				    "UTF-8");}});

	for (Iterator it = times.iterator(); it.hasNext(); )
	    System.out.print(it.next() + " ");
	System.out.println();
    }
}
----------------------------------------------------------------------

When run against all the 1.4.2 update releases and 1.5.0, I get:

1.4.2_01 -server 4590
1.4.2_01 -client 5868
1.4.2_02 -server 4372
1.4.2_02 -client 6244
1.4.2_03 -server 4641
1.4.2_03 -client 5738
1.4.2_04 -server 4668
1.4.2_04 -client 6117
1.4.2_05 -server 30
1.4.2_05 -client 75
1.4.2_06 -server 30
1.4.2_06 -client 76
1.5.0 -server 23
1.5.0 -client 50

So the news is very very good.  Charset creation is dramatically cheaper
as of 1.4.2_05, and it's even better in 1.5.0.  With this, there is much
less need to cache Charsets, especially since the caching involves
overhead of its own.   Nevertheless, a usage pattern involving two
Charsets is sufficiently common that caching is a good idea.

Measurements indicate that the simplest possible
increase in caching, i.e. adding another simple 1-element cache, is
the best engineering decision.

###@###.### 2004-08-02
---------------------------------------------------------------

With the new suggested fix, the results of running the tests on a win2k machine using the default client jvm are:

CharsetBench test case
1.4.2_06		687	687	67656	703	688	67672
1.4.2_06 (with fix)	672	672	1406	671	671	1390

isSupportedTest test case
1.4.2_06		78	64	8046
1.4.2_06 (with fix)	63	32	282

###@###.### 2004-08-04
                                     
2004-08-04
PUBLIC COMMENTS

-
                                     
2004-10-02
CONVERTED DATA

BugTraq+ Release Management Values

COMMIT TO FIX:
1.4.2_07
1.5.0_01
mustang

FIXED IN:
1.4.2_07
1.5.0_01
mustang

INTEGRATED IN:
1.4.2_07
1.5.0_01
mustang


                                     
2004-10-02



Hardware and Software, Engineered to Work Together