JDK-6219959 : StringBuffer.toString() innefficient in 1.5.0
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 9.0pe,5.0
  • Priority: P2
  • Status: Closed
  • Resolution: Won't Fix
  • OS: generic,windows_2000,windows_xp
  • CPU: generic,x86
  • Submitted: 2005-01-21
  • Updated: 2013-05-10
  • Resolved: 2006-06-15
Related Reports
Duplicate :  
Duplicate :  
Relates :  
Description
FULL PRODUCT VERSION :
java version "1.5.0_01"
java version "1.4.2_06"

ADDITIONAL OS VERSION INFORMATION :
Windows 2000 SP4

EXTRA RELEVANT SYSTEM CONFIGURATION :
java -version


A DESCRIPTION OF THE PROBLEM :
StringBuffer.toString() optimizations have been removed in 1.5.  This makes applications that repeadedly call sb.toString() on an large unchanged StringBuffer extreemly slow.

An example, we used to be able to parse and init a program in 600ms in 1.4.2 in 1.5 it takes over 4 minutes.

This is beacuse the char[] sharing in StringBuffer and String has been removed in  1.5 so the char[] is always copied, and not copied when modified, so for large strings (1 MB) with 10 toString()'s between modification of the buffer you have to copy 10 times the amount of data around!  It would also appear to be a waste of memory as well as CPU.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
run the program below with both 1.4.2 and 1.5

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
the performance is on a par if not better than 1.4


ACTUAL -
the performance is *exceptionally* slower.

on a 2.4GHz P4 single CPU box 1.4 can call toString 10000000 times if the buffer has not been changed in the time that 1.5 can make less than 100 calls.

C:\sun>c:\java\jdk1.5.0_01\bin\java StringBufferTest 10
Starting test
stoped test
Duration of toString loop: 328 milliseconds

C:\sun>c:\java\jdk1.5.0_01\bin\java StringBufferTest 100
Starting test
stoped test
Duration of toString loop: 2969 milliseconds

C:\sun>c:\java\j2sdk1.4.2_06\bin\java.exe StringBufferTest 100
Starting test
stoped test
Duration of toString loop: 0 milliseconds

C:\sun>c:\java\j2sdk1.4.2_06\bin\java.exe StringBufferTest 10000
Starting test
stoped test
Duration of toString loop: 15 milliseconds

C:\sun>c:\java\j2sdk1.4.2_06\bin\java.exe StringBufferTest 100000
Starting test
stoped test
Duration of toString loop: 16 milliseconds

C:\sun>c:\java\j2sdk1.4.2_06\bin\java.exe StringBufferTest 10000000
Starting test
stoped test
Duration of toString loop: 1297 milliseconds



REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
/**
 * Class to show dumb behaviour of 1.5 compared to 1.4.
 * usage java StringBufferTest <number of StringBuffer.toString() loops>
 */
public final class StringBufferTest {
	
	
	
	public static void main(String[] args) {
		int loops = Integer.parseInt(args[0]);
		char[] eightChars = "01234567".toCharArray();
		StringBuffer strB = new StringBuffer();
		// simulate a large StringBuffer (eg an 0.5MB XML document) (UTF-8);
		for (int i=0; i < (1024*1024/8); i++) {
			strB.append(eightChars);
		}
		
			
		long start = 0;
		long stop = 0;
		String str = null;
		System.out.println("Starting test");
		start = System.currentTimeMillis();
		for (int i=0; i < loops; i++) {
			str = strB.toString();
		}
		stop = System.currentTimeMillis();
		System.out.println("stoped test");
		System.out.println("Duration of toString loop: " + (stop - start) + " milliseconds");
	}
}

---------- END SOURCE ----------
###@###.### 2005-1-21 06:22:32 GMT

Comments
EVALUATION A fix for the stated problem has been prototyped and tested. A webrev of the associated changes is attached. Unfortunately simply reintroducing the sharing between String and the StringBuffer classes which was present in the 1.4.x release train is not so simple. The following need to be taken into consideration: 1. Between 1.4.x and 1.5.0 the unsynchronized StringBuilder class was introduced and javac changed to default to using StringBuilder rather than StringBuffer for the language's "+" operator. The old sharing code relied on the synchronization in the StringBuffer class for correctness. 2. There were problems in earlier JDK releases with Strings created from StringBuffers with large backing stores causing large amounts of inaccessible memory to be consumed. This was one of the original motivations for completely removing the String/StringBuffer sharing, done by ensuring that the String's backing storage was sized precisely correctly. The prototype was specifically designed to address issue (2) above. Construction from StringBuffers and StringBuilders, as well as "trimming" of Strings using the copy constructor, all take into consideration the length of the resulting String as well as the size of the underlying char[] being used to construct the String. If the underlying char[] is too big, a copy is made. In general, in the presence of an unsynchronized StringBuilder and the String/StringBuilder sharing, it is not possible to guarantee 100% adherence to the specification of the String class. The basic problem occurs when one thread calls toString() on the StringBuilder at the same time another thread calls a method which mutates the StringBuilders' contents. In the absence of synchronization it is theoretically possible that the String result of the toString() method may become visible to the first thread before the second thread notices that the backing char[] of the StringBuilder is now shared with another thread, so for a brief period of time the first thread may see the contents of the String mutate, which is explicitly disallowed by the specification of that class. In practice, it is very unlikely that this mutation could really be seen by the first thread, because it is guaranteed to occur only for a brief and bounded period of time, and basically impossible that it could be controlled in any repeatable manner, but the possibility exists nonetheless. In the prototype, the boolean flag indicating whether the contents of the StringBuilder / StringBuffer are shared with a String was made volatile to attempt to provide some timeliness guarantees about when the presence of sharing would become visible to other threads. One might argue that the results of String construction using StringBuilder are undefined anyway in the face of multithreaded access because StringBuilder is unsynchronized, but that is not really the point; Strings must be immutable after construction, and if a risk of any kind is going to be taken in this area, clear and across-the-board performance improvements need to be shown. The prototype did address the pathological regression demonstrated by the microbenchmark in the evaluation, and also solved an OutOfMemoryError raised by a customer in an escalation who had transitioned from JDK 1.4.2 to 1.5. We attempted to find other benchmarks to use to validate the expected performance improvement. Unfortunately, we were not able to find any. Sun's standard suite of Java benchmarks used for performance measurement did not yield any speedup, and in fact showed a regression on the key SPECjbb2005 benchmark as is discussed below. The Sun-internal XMLTest suite used to validate XML parsing and handling performance was run, but no improvement was seen. This contradicts the claims by the submitter that XML-related codes are adversely impacted by the lack of String/StringBuffer sharing in 1.5. It was discovered that the reintroduction of the sharing code caused a reproducible regression on the order of 4% in SPECjbb2005 scores. Extensive testing did not yield a root cause of the slowdown. It was hypothesized that perhaps additional storage in Strings in the heap was slowing down the garbage collector. Measurements were taken, but it was not possible that additional time spent in the garbage collector could have resulted in this magnitude of drop in the benchmark score; in fact, GC times decreased somewhat due to the allocation of less storage during the run. It was hypothesized that the introduction of the volatile boolean test in several places may have impacted performance, but this was removed to no effect. It was hypothesized that some refactoring that came along with the sharing code may have perturbed the inlining performed by the Java Hotspot compilers. This refactoring was left in place but the sharing removed; the scores returned to their original values. An experiment was done whereby the multiplicative factor indicating how large the backing store of the String could grow was set to zero; the default value had been 4, indicating that a String could keep at most four times its length worth of backing store alive. Despite the presence of all of the other code changes and refactorings, the scores returned to their original values with the zero multiplicative factor. This indicates that somehow the performance is impacted simply by the presence of extra unused data, even though GC times didn't increase significantly. Examination of profiler output did not point to an obvious reason for the slowdown. The improvement of behavior of the customer application mentioned above was analyzed. Due to the characteristics of the reintroduction of the sharing code, there was only one possibility for why it helped significantly, which was that the application was doing redundant work, either substring or toString() operations. Sun suggested to the customer to instead maintain their own String intern table at the application level, thereby avoiding holding onto multiple copies of the same String data. This both solved the customer's OutOfMemoryError problems as well as increased the performance of the application. One might consider reintroducing sharing only between String and StringBuffer and not between String and StringBuilder to avoid the correctness problems described above, but because StringBuffer and StringBuilder share most of their implementations, this would result in a large amount of code duplication and make maintenance much more difficult. Additionally, given that javac no longer emits StringBuffer code for the "+" operator, use of StringBuffer is expected to decrease over time so this would provide a diminishing improvement anyway. Given the correctness risks of fixing this bug, the fact that so far only microbenchmarks and pathological application codes seem to show an improvement with the fix, and that there is a hard-to-track-down performance regression on a key benchmark with the fix, this bug is being closed as "will not fix". The following further points should be made: 1. There have been very few reports of this problem since the time 5.0 was released. Clearly it has not impacted most customers' applications. 2. Internal attempts to characterize StringBuffer performance have resulted in the conclusion that 5.0 is actually faster than 1.4.2 at String manipulation in almost all cases. 3. Given the current structure of the core libraries' code, the correct place for this optimization to take place, if it does at all, is probably in the JVM using escape analysis. Please reopen this bug if a reasonable, real-world application can be produced which is significantly and adverseley impacted by the lack of String / StringBuilder / StringBuffer data sharing.
15-06-2006

EVALUATION This was caused by the re-organization of the StringBuffer code during introduction of StringBuilder. ###@###.### 2005-1-21 19:00:18 GMT
21-01-2005