JDK-8013395 : StringBuffer.toString performance regression impacting embedded benchmarks
  • Type: Bug
  • Component: embedded
  • Sub-Component: libraries
  • Affected Version: 8
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2013-04-26
  • Updated: 2013-05-24
  • Resolved: 2013-05-15
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8
8 b91Fixed
Related Reports
Relates :  
Relates :  
Description
A change in the implementation of the core libraries StringBuffer was made in 1.4.2 causing a significant regression in the StringBuffer.toString method.  In 1.4.2 the underlying character array was shared between the String implementation and the StringBuffer backing array.  This implementation detail reduced the overhead of creating the String array since the data did not need to be copied.  There was a CR filed in 2005 (JDK-6219959) to address this issue but the developer working in this fix could not measure any benchmark that was impacted by this regression.  The CR contains a suggested fix that was never implemented.

Embedded benchmarks used in evaluating CDC and SE Embedded are showing a significant performance difference due to this change in StringBuffer implementation.

Comments
Alan points out that the existing implementation already uses 2x memory at the time of toString and caching would be no different. (This is 2x compared to the original char[] sharing between String and StringBuffer). The caching implementation could retain the extra memory longer in some cases but the most comment usecases for StringBuffers either drop the reference to the SB and toString results after use; or else continue to modify the SB before calling toString again - in which case the previous cached value can be GC'd if unused elsewhere. So leakage is not a concern. We've also established that the hotspot OptimizeStringConcat optimizations in C2 should not have any affect on this caching implementation. That optimization can take a sequence of operations of the form: new StringBuffer(...).append(...).append(...).toString() and simply create a String with the required contents - no StringBuffer get created at all. The final form of the solution caches a char[] to be used to create a new String, rather than caching the actual String.
14-05-2013

Another issue with sharing is excessive memory use and garbage retention. I don't think garbage retention per-se is an issue for the proposed caching. But we will be using twice as much memory due to the two copies of the char[].
12-05-2013

We can avoid the change to the spec by using the String(String s) copy constructor to create a new String that shares the char[] of the cached copy. This avoids the overhead of copying the array which is the main source of the performance regression.
10-05-2013

I wonder why the spec says this. Strings are not mutable. Why should it be necessary to create a new String instance?
08-05-2013

The change should be applied to StringBuffer only, not AbstractStringBuilder, which would affect StringBuilder. This requires a change to the specification for StringBuffer.toString as it is supposed to return a new String instance each time.
08-05-2013