JDK-4806753 : Using alternating charsets with String(byte[]) and String.getBytes is very slow
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Affected Version: 1.4.1
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2003-01-23
  • Updated: 2012-10-09
  • Resolved: 2004-09-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other JDK 6
5.0u1 01Fixed 6Fixed
Related Reports
Relates :  
Description

Name: nt126004			Date: 01/22/2003


FULL PRODUCT VERSION :
New slow version:
java version "1.4.1_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_01-b01)
Java HotSpot(TM) Client VM (build 1.4.1_01-b01, mixed mode)

  Old fast version:
java version "1.4.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-b92)
Java HotSpot(TM) Client VM (build 1.4.0-b92, mixed mode)

FULL OPERATING SYSTEM VERSION :
Microsoft Windows XP [Version 5.1.2600]

EXTRA RELEVANT SYSTEM CONFIGURATION :
Japanese System

A DESCRIPTION OF THE PROBLEM :
We just upgraded our JVM from version 1.4.0-b92 to
1.4.1_01-b01 andencountered a severe performance degradation of our JSP
application.
Database Queries that had been taking around 200ms are now
taking inthe range of 20-40 seconds!  Making this JVM unusable for us.

After doing quite a bit of debugging and digging into the
source code, I believe the problem is being caused by the way
java.lang.StringCoding is caching StringDecoder instances.

We are using the mysql-connector-j-2.0.14-bin.jar driver to
connect to a MySQL database.  The records that are suffering the severe
performance degradation contain both boolean and float values.

Digging into the code I would say that the problem is being
caused by the alternating calls to the StringCoding.decode where the
charsetName is alternating between the value (Shift_JIS) set on our DB
connection and the system default (MS932).   The way that class is written,
it does a very expensive lookup and then caches the StringDecoder for
each thread until the required decoder changes.  When the charset
alternates.  This results in an expensive lookup for every single call.

I have not looked into how it was being done before.
Looking around, I think this might be related to bug #4691554

We are able to work around this on Japanese machines by
setting the DB connection char set to MS932 so that it matches the system
default.  But this will not work for systems where the default is different.

REGRESSION.  Last worked in version 1.4

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
  To run the full example using the connection to the database
1. java -classpath
build/classes;lib/mysql-connector-j-2.0.14-bin.jar SlowMySQLTest
2. On 1.4.0, you will see the following:
---
Creating test table
Creating test data...
Done
Test encoding: Shift_JIS
Time = 611ms
Test encoding: MS932
Time = 170ms
Test encoding: Shift_JIS
Time = 100ms
Test encoding: MS932
Time = 131ms
---
On 1.4.1 you will see the following:
---
Creating test table
Creating test data...
Done
Test encoding: Shift_JIS
Time = 29001ms
Test encoding: MS932
Time = 250ms
Test encoding: Shift_JIS
Time = 23033ms
Test encoding: MS932
Time = 180ms
---

You may also want to skip strait to the simplified example.
 1. java -classpath build/classes SlowTest
2. On 1.4.0, you will see the following:
---
Test encoding: Shift_JIS
Time = 550ms
Test encoding: MS932
Time = 81ms
Test encoding: Shift_JIS
Time = 120ms
Test encoding: MS932
Time = 90ms
---
On 1.4.1 you will see the following:
---
Test encoding: Shift_JIS
Time = 28781ms
Test encoding: MS932
Time = 110ms
Test encoding: Shift_JIS
Time = 23374ms
Test encoding: MS932
Time = 90ms
---


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
SlowMySQLTest.java
-------------------------------------------------
import java.sql.Connection;
import java.sql.Driver;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Properties;

public class SlowMySQLTest {
	public double getDouble( String str )
	{
		byte[] buf = str.getBytes();
		
		return Double.parseDouble( new String( buf ) );
	}

	/**
	 * Create the test table and data.
	 */
	public void setup()
	{
		try
		{
			System.out.println( "Creating test table" );
			Properties props = new Properties();
			Driver d = new com.mysql.jdbc.Driver();
			Connection c = d.connect(
				"jdbc:mysql://localhost/test?useUnicode=true&characterEncoding=Shift_JIS",
				props );
			Statement stmt = c.createStatement();
			stmt.executeUpdate( "DROP TABLE slow_test" );
			
			stmt.executeUpdate(
				"CREATE TABLE slow_test ( "
				+ "id INT NOT NULL PRIMARY KEY AUTO_INCREMENT, "
				+ "flt FLOAT NOT NULL, flag CHAR(1) NOT NULL )" );
			
			System.out.println( "Creating test data..." );
			for ( int i = 0; i < 2000; i++ )
			{
				stmt.executeUpdate( "INSERT INTO slow_test VALUES ( NULL, " + i + ", 'y')" );
			}
			System.out.println( "Done" );
			
			stmt.close();
			c.close();
		}
		catch ( SQLException e )
		{
			e.printStackTrace();
		}
	}
	
	public void test( String encoding )
	{
		System.out.println( "Test encoding: " + encoding );
		long start = System.currentTimeMillis();
		
		try
		{
			Properties props = new Properties();
			Driver d = new com.mysql.jdbc.Driver();
			Connection c = d.connect(
				"jdbc:mysql://localhost/test?useUnicode=true&characterEncoding="
				+ encoding, props );
			Statement stmt = c.createStatement();
			ResultSet rs = stmt.executeQuery( "SELECT id, flt, flag FROM slow_test" );
			while ( rs.next() )
			{
				int id = rs.getInt( 1 );
				float flt = rs.getFloat( 2 );
				boolean flag = rs.getBoolean( 3 );
			}
			
			stmt.close();
			c.close();
		}
		catch ( SQLException e )
		{
			e.printStackTrace();
		}
		
		System.out.println( "Time = " + ( System.currentTimeMillis() - start ) + "ms" );
	}
	
	public static void main( String[] args ) {
		SlowMySQLTest slowTest = new SlowMySQLTest();
		slowTest.setup();
		slowTest.test( "Shift_JIS" );
		slowTest.test( "MS932" );
		slowTest.test( "Shift_JIS" );
		slowTest.test( "MS932" );
	}
}
-------------------------------------------------

SlowTest.java
-------------------------------------------------
public class SlowTest {
	public double getDouble( String str )
	{
		byte[] buf = str.getBytes();
		
		return Double.parseDouble( new String( buf ) );
	}

	
	/*
	 * The contents of this method are result of narrowing down the
	 *  the base cause of the code in SlowMySQLTest.test().
	 */
	public void test( String encoding )
	{
		System.out.println( "Test encoding: " + encoding );
		long start = System.currentTimeMillis();
		
		for ( int i = 0; i < 2000; i++ ) {
			// rs.getFloat( 2 );
			double d = Double.parseDouble( new String( "1".getBytes() ) );
			
			// rs.getBoolean( 3 );
			try
			{
				new String( "true".getBytes(), encoding );
			}
			catch (java.io.UnsupportedEncodingException e)
			{
				e.printStackTrace();
			}
		}
		
		System.out.println( "Time = " + ( System.currentTimeMillis() - start ) + "ms" );
	}
	
	public static void main( String[] args ) {
		SlowTest slowTest = new SlowTest();
		slowTest.test( "Shift_JIS" );
		slowTest.test( "MS932" );
		slowTest.test( "Shift_JIS" );
		slowTest.test( "MS932" );
	}
}
-------------------------------------------------

---------- END SOURCE ----------

CUSTOMER WORKAROUND :
If we make a call to
sun.io.Converters.getDefaultEncodingName() on our machine
then the current encoding returns as "MS932".  The
configuration for the database connection normally uses
"Shift_JIS".   By replacing this value so it matches the
default value, we are able to work around the problem.

The problem is that this ONLY works on Japanese machines.
If the defaule encoding returned is not MS932 then this
workaround will no longer work.

Unfortunately, we will have to stay with the older version
of the JVM for our customers until this has been resolved.

Release Regression From : 1.4
The above release value was the last known release where this 
bug was known to work. Since then there has been a regression.

(Review ID: 180029) 
======================================================================

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: 1.5.0_01 mustang FIXED IN: 1.5.0_01 mustang INTEGRATED IN: 1.5.0_01 mustang
26-09-2004

PUBLIC COMMENTS -
26-09-2004

EVALUATION The effects of this bug are ameliorated somewhat by the fix for 4752992, which is present in 1.4.0_04, 1.4.1_03, and 1.4.2. We are still far from the performance of 1.4.0, however, and fixing this requires the introduction of thread-local caches in the internal java.lang.StringCoding class. With such caches these operations become fastest they've ever been in any release. -- ###@###.### 2003/1/25 I believe this will be resolved by the upcoming fix to 5002890: (cs) Charset.isSupported is slow when invoked for different charsets ###@###.### 2004-09-01
01-11-0191

SUGGESTED FIX *** /tmp/geta19661 Sat Jan 25 12:21:29 2003 --- StringCoding.java Sat Jan 25 12:10:29 2003 *************** *** 1,5 **** /* ! * @(#)StringCoding.java 1.9 02/04/09 * * Copyright 2002 Sun Microsystems, Inc. All rights reserved. * SUN PROPRIETARY/CONFIDENTIAL. Use is subject to license terms. --- 1,5 ---- /* ! * %W% %E% * * Copyright 2002 Sun Microsystems, Inc. All rights reserved. * SUN PROPRIETARY/CONFIDENTIAL. Use is subject to license terms. *************** *** 38,59 **** private StringCoding() { } - /* The cached coders for each thread - */ - private static ThreadLocal decoder = new ThreadLocal(); - private static ThreadLocal encoder = new ThreadLocal(); - private static boolean warnUnsupportedCharset = true; ! private static Object deref(ThreadLocal tl) { ! SoftReference sr = (SoftReference)tl.get(); ! if (sr == null) return null; ! return sr.get(); } ! private static void set(ThreadLocal tl, Object ob) { ! tl.set(new SoftReference(ob)); } // Trim the given byte array to the given length --- 38,98 ---- private StringCoding() { } private static boolean warnUnsupportedCharset = true; ! private static abstract class StringCoder { ! private final String requestedCharsetName; ! protected StringCoder(String requestedCharsetName) { ! this.requestedCharsetName = requestedCharsetName; ! } ! final String requestedCharsetName() { ! return requestedCharsetName; ! } ! abstract String charsetName(); ! } ! ! // Cached coders for each thread ! // ! private static final int CACHE_SIZE = 3; ! private static ThreadLocal decoders = new ThreadLocal(); ! private static ThreadLocal encoders = new ThreadLocal(); ! ! private static void moveToFront(Object[] oa, int i) { ! Object ob = oa[i]; ! for (int j = i; j > 0; j--) ! oa[j] = oa[j - 1]; ! oa[0] = ob; ! } ! ! private static StringCoder cacheGet(ThreadLocal coders, String csn) { ! SoftReference[] srs = (SoftReference[])coders.get(); ! if (srs == null) return null; ! for (int i = 0; i < srs.length; i++) { ! SoftReference sr = srs[i]; ! if (sr == null) ! continue; ! StringCoder sc = (StringCoder)sr.get(); ! if (sc == null) ! continue; ! if ( sc.requestedCharsetName().equals(csn) ! || sc.charsetName().equals(csn)) { ! if (i > 0) ! moveToFront(srs, i); ! return sc; ! } ! } ! return null; } ! private static void cachePut(ThreadLocal coders, StringCoder sc) { ! SoftReference[] srs = (SoftReference[])coders.get(); ! if (srs == null) { ! srs = new SoftReference[3]; ! coders.set(srs); ! } ! srs[srs.length - 1] = new SoftReference(sc); ! moveToFront(srs, srs.length - 1); } // Trim the given byte array to the given length *************** *** 105,119 **** // Encapsulates either a ByteToCharConverter or a CharsetDecoder // ! private static abstract class StringDecoder { ! private final String requestedCharsetName; protected StringDecoder(String requestedCharsetName) { ! this.requestedCharsetName = requestedCharsetName; } - final String requestedCharsetName() { - return requestedCharsetName; - } - abstract String charsetName(); abstract char[] decode(byte[] ba, int off, int len); } --- 144,155 ---- // Encapsulates either a ByteToCharConverter or a CharsetDecoder // ! private static abstract class StringDecoder ! extends StringCoder ! { protected StringDecoder(String requestedCharsetName) { ! super(requestedCharsetName); } abstract char[] decode(byte[] ba, int off, int len); } *************** *** 202,212 **** static char[] decode(String charsetName, byte[] ba, int off, int len) throws UnsupportedEncodingException { - StringDecoder sd = (StringDecoder)deref(decoder); String csn = (charsetName == null) ? "ISO-8859-1" : charsetName; ! if ((sd == null) || !(csn.equals(sd.requestedCharsetName()) ! || csn.equals(sd.charsetName()))) { ! sd = null; try { Charset cs = lookupCharset(csn); if (cs != null) --- 238,246 ---- static char[] decode(String charsetName, byte[] ba, int off, int len) throws UnsupportedEncodingException { String csn = (charsetName == null) ? "ISO-8859-1" : charsetName; ! StringDecoder sd = (StringDecoder)cacheGet(decoders, csn); ! if (sd == null) { try { Charset cs = lookupCharset(csn); if (cs != null) *************** *** 219,225 **** if (sd == null) sd = new ConverterSD(ByteToCharConverter.getConverter(csn), csn); ! set(decoder, sd); } return sd.decode(ba, off, len); } --- 253,259 ---- if (sd == null) sd = new ConverterSD(ByteToCharConverter.getConverter(csn), csn); ! cachePut(decoders, sd); } return sd.decode(ba, off, len); } *************** *** 253,267 **** // Encapsulates either a CharToByteConverter or a CharsetEncoder // ! private static abstract class StringEncoder { ! private final String requestedCharsetName; protected StringEncoder(String requestedCharsetName) { ! this.requestedCharsetName = requestedCharsetName; } - final String requestedCharsetName() { - return requestedCharsetName; - } - abstract String charsetName(); abstract byte[] encode(char[] cs, int off, int len); } --- 287,298 ---- // Encapsulates either a CharToByteConverter or a CharsetEncoder // ! private static abstract class StringEncoder ! extends StringCoder ! { protected StringEncoder(String requestedCharsetName) { ! super(requestedCharsetName); } abstract byte[] encode(char[] cs, int off, int len); } *************** *** 354,364 **** static byte[] encode(String charsetName, char[] ca, int off, int len) throws UnsupportedEncodingException { - StringEncoder se = (StringEncoder)deref(encoder); String csn = (charsetName == null) ? "ISO-8859-1" : charsetName; ! if ((se == null) || !(csn.equals(se.requestedCharsetName()) ! || csn.equals(se.charsetName()))) { ! se = null; try { Charset cs = lookupCharset(csn); if (cs != null) --- 385,393 ---- static byte[] encode(String charsetName, char[] ca, int off, int len) throws UnsupportedEncodingException { String csn = (charsetName == null) ? "ISO-8859-1" : charsetName; ! StringEncoder se = (StringEncoder)cacheGet(encoders, csn); ! if (se == null) { try { Charset cs = lookupCharset(csn); if (cs != null) *************** *** 369,375 **** if (se == null) se = new ConverterSE(CharToByteConverter.getConverter(csn), csn); ! set(encoder, se); } return se.encode(ca, off, len); } --- 398,404 ---- if (se == null) se = new ConverterSE(CharToByteConverter.getConverter(csn), csn); ! cachePut(encoders, se); } return se.encode(ca, off, len); } -- ###@###.### 2003/1/25
01-11-0191