United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-6219959 : StringBuffer.toString() innefficient in 1.5.0

Details
Type:
Bug
Submit Date:
2005-01-21
Status:
Closed
Updated Date:
2013-05-10
Project Name:
JDK
Resolved Date:
2006-06-15
Component:
core-libs
OS:
generic,windows_xp,windows_2000
Sub-Component:
java.lang
CPU:
x86,generic
Priority:
P2
Resolution:
Won't Fix
Affected Versions:
9.0pe,5.0
Fixed Versions:

Related Reports
Duplicate:
Duplicate:
Relates:

Sub Tasks

Description
FULL PRODUCT VERSION :
java version "1.5.0_01"
java version "1.4.2_06"

ADDITIONAL OS VERSION INFORMATION :
Windows 2000 SP4

EXTRA RELEVANT SYSTEM CONFIGURATION :
java -version


A DESCRIPTION OF THE PROBLEM :
StringBuffer.toString() optimizations have been removed in 1.5.  This makes applications that repeadedly call sb.toString() on an large unchanged StringBuffer extreemly slow.

An example, we used to be able to parse and init a program in 600ms in 1.4.2 in 1.5 it takes over 4 minutes.

This is beacuse the char[] sharing in StringBuffer and String has been removed in  1.5 so the char[] is always copied, and not copied when modified, so for large strings (1 MB) with 10 toString()'s between modification of the buffer you have to copy 10 times the amount of data around!  It would also appear to be a waste of memory as well as CPU.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
run the program below with both 1.4.2 and 1.5

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
the performance is on a par if not better than 1.4


ACTUAL -
the performance is *exceptionally* slower.

on a 2.4GHz P4 single CPU box 1.4 can call toString 10000000 times if the buffer has not been changed in the time that 1.5 can make less than 100 calls.

C:\sun>c:\java\jdk1.5.0_01\bin\java StringBufferTest 10
Starting test
stoped test
Duration of toString loop: 328 milliseconds

C:\sun>c:\java\jdk1.5.0_01\bin\java StringBufferTest 100
Starting test
stoped test
Duration of toString loop: 2969 milliseconds

C:\sun>c:\java\j2sdk1.4.2_06\bin\java.exe StringBufferTest 100
Starting test
stoped test
Duration of toString loop: 0 milliseconds

C:\sun>c:\java\j2sdk1.4.2_06\bin\java.exe StringBufferTest 10000
Starting test
stoped test
Duration of toString loop: 15 milliseconds

C:\sun>c:\java\j2sdk1.4.2_06\bin\java.exe StringBufferTest 100000
Starting test
stoped test
Duration of toString loop: 16 milliseconds

C:\sun>c:\java\j2sdk1.4.2_06\bin\java.exe StringBufferTest 10000000
Starting test
stoped test
Duration of toString loop: 1297 milliseconds



REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
/**
 * Class to show dumb behaviour of 1.5 compared to 1.4.
 * usage java StringBufferTest <number of StringBuffer.toString() loops>
 */
public final class StringBufferTest {
	
	
	
	public static void main(String[] args) {
		int loops = Integer.parseInt(args[0]);
		char[] eightChars = "01234567".toCharArray();
		StringBuffer strB = new StringBuffer();
		// simulate a large StringBuffer (eg an 0.5MB XML document) (UTF-8);
		for (int i=0; i < (1024*1024/8); i++) {
			strB.append(eightChars);
		}
		
			
		long start = 0;
		long stop = 0;
		String str = null;
		System.out.println("Starting test");
		start = System.currentTimeMillis();
		for (int i=0; i < loops; i++) {
			str = strB.toString();
		}
		stop = System.currentTimeMillis();
		System.out.println("stoped test");
		System.out.println("Duration of toString loop: " + (stop - start) + " milliseconds");
	}
}

---------- END SOURCE ----------
###@###.### 2005-1-21 06:22:32 GMT

                                    

Comments
EVALUATION

This was caused by the re-organization of the StringBuffer code during introduction of StringBuilder.  

###@###.### 2005-1-21 19:00:18 GMT
                                     
2005-01-21
EVALUATION

A fix for the stated problem has been prototyped and tested. A webrev
of the associated changes is attached.

Unfortunately simply reintroducing the sharing between String and the
StringBuffer classes which was present in the 1.4.x release train is
not so simple. The following need to be taken into consideration:

  1. Between 1.4.x and 1.5.0 the unsynchronized StringBuilder class
     was introduced and javac changed to default to using
     StringBuilder rather than StringBuffer for the language's "+"
     operator. The old sharing code relied on the synchronization in
     the StringBuffer class for correctness.

  2. There were problems in earlier JDK releases with Strings created
     from StringBuffers with large backing stores causing large
     amounts of inaccessible memory to be consumed. This was one of
     the original motivations for completely removing the
     String/StringBuffer sharing, done by ensuring that the String's
     backing storage was sized precisely correctly.

The prototype was specifically designed to address issue (2) above.
Construction from StringBuffers and StringBuilders, as well as
"trimming" of Strings using the copy constructor, all take into
consideration the length of the resulting String as well as the size
of the underlying char[] being used to construct the String. If the
underlying char[] is too big, a copy is made.

In general, in the presence of an unsynchronized StringBuilder and the
String/StringBuilder sharing, it is not possible to guarantee 100%
adherence to the specification of the String class. The basic problem
occurs when one thread calls toString() on the StringBuilder at the
same time another thread calls a method which mutates the
StringBuilders' contents. In the absence of synchronization it is
theoretically possible that the String result of the toString() method
may become visible to the first thread before the second thread
notices that the backing char[] of the StringBuilder is now shared
with another thread, so for a brief period of time the first thread
may see the contents of the String mutate, which is explicitly
disallowed by the specification of that class. In practice, it is very
unlikely that this mutation could really be seen by the first thread,
because it is guaranteed to occur only for a brief and bounded period
of time, and basically impossible that it could be controlled in any
repeatable manner, but the possibility exists nonetheless. In the
prototype, the boolean flag indicating whether the contents of the
StringBuilder / StringBuffer are shared with a String was made
volatile to attempt to provide some timeliness guarantees about when
the presence of sharing would become visible to other threads. One
might argue that the results of String construction using
StringBuilder are undefined anyway in the face of multithreaded access
because StringBuilder is unsynchronized, but that is not really the
point; Strings must be immutable after construction, and if a risk of
any kind is going to be taken in this area, clear and across-the-board
performance improvements need to be shown.

The prototype did address the pathological regression demonstrated by
the microbenchmark in the evaluation, and also solved an
OutOfMemoryError raised by a customer in an escalation who had
transitioned from JDK 1.4.2 to 1.5.

We attempted to find other benchmarks to use to validate the expected
performance improvement. Unfortunately, we were not able to find any.
Sun's standard suite of Java benchmarks used for performance
measurement did not yield any speedup, and in fact showed a regression
on the key SPECjbb2005 benchmark as is discussed below. The
Sun-internal XMLTest suite used to validate XML parsing and handling
performance was run, but no improvement was seen. This contradicts the
claims by the submitter that XML-related codes are adversely impacted
by the lack of String/StringBuffer sharing in 1.5.

It was discovered that the reintroduction of the sharing code caused a
reproducible regression on the order of 4% in SPECjbb2005 scores.
Extensive testing did not yield a root cause of the slowdown. It was
hypothesized that perhaps additional storage in Strings in the heap
was slowing down the garbage collector. Measurements were taken, but
it was not possible that additional time spent in the garbage
collector could have resulted in this magnitude of drop in the
benchmark score; in fact, GC times decreased somewhat due to the
allocation of less storage during the run. It was hypothesized that
the introduction of the volatile boolean test in several places may
have impacted performance, but this was removed to no effect. It was
hypothesized that some refactoring that came along with the sharing
code may have perturbed the inlining performed by the Java Hotspot
compilers. This refactoring was left in place but the sharing removed;
the scores returned to their original values. An experiment was done
whereby the multiplicative factor indicating how large the backing
store of the String could grow was set to zero; the default value had
been 4, indicating that a String could keep at most four times its
length worth of backing store alive. Despite the presence of all of
the other code changes and refactorings, the scores returned to their
original values with the zero multiplicative factor. This indicates
that somehow the performance is impacted simply by the presence of
extra unused data, even though GC times didn't increase significantly.
Examination of profiler output did not point to an obvious reason for
the slowdown.

The improvement of behavior of the customer application mentioned
above was analyzed. Due to the characteristics of the reintroduction
of the sharing code, there was only one possibility for why it helped
significantly, which was that the application was doing redundant
work, either substring or toString() operations. Sun suggested to the
customer to instead maintain their own String intern table at the
application level, thereby avoiding holding onto multiple copies of
the same String data. This both solved the customer's OutOfMemoryError
problems as well as increased the performance of the application.

One might consider reintroducing sharing only between String and
StringBuffer and not between String and StringBuilder to avoid the
correctness problems described above, but because StringBuffer and
StringBuilder share most of their implementations, this would result
in a large amount of code duplication and make maintenance much more
difficult. Additionally, given that javac no longer emits StringBuffer
code for the "+" operator, use of StringBuffer is expected to decrease
over time so this would provide a diminishing improvement anyway.

Given the correctness risks of fixing this bug, the fact that so far
only microbenchmarks and pathological application codes seem to show
an improvement with the fix, and that there is a hard-to-track-down
performance regression on a key benchmark with the fix, this bug is
being closed as "will not fix". The following further points should be
made:

  1. There have been very few reports of this problem since the time
     5.0 was released. Clearly it has not impacted most customers'
     applications.

  2. Internal attempts to characterize StringBuffer performance have
     resulted in the conclusion that 5.0 is actually faster than 1.4.2
     at String manipulation in almost all cases.

  3. Given the current structure of the core libraries' code, the
     correct place for this optimization to take place, if it does at
     all, is probably in the JVM using escape analysis.

Please reopen this bug if a reasonable, real-world application can be
produced which is significantly and adverseley impacted by the lack of
String / StringBuilder / StringBuffer data sharing.
                                     
2006-06-15



Hardware and Software, Engineered to Work Together