JDK-4834695 : RFE: StringBuffer and String.substring() optimizations
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 1.4.1
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2003-03-19
  • Updated: 2003-03-20
  • Resolved: 2003-03-20
Related Reports
Duplicate :  
Relates :  
Description

Name: rmT116609			Date: 03/19/2003


A DESCRIPTION OF THE REQUEST :
The VM is stressed too much by unnecessary copies of strings and substrings:

1) StringBuffer is explicitly optimized to support String concatenation, but it excessively uses the String.setShared() in case where it should not:
if the StringBuffer is not fully used (this appears if setLength() has been used to reduce the string stored in the StringBuffer) then the conversion to a String should not use setShared, but instead keep the StringBuffer storage kept allocated, and a new non-shared character array should be allocated for the new String. Applications of this is to maximize the reuse of StringBuffers, while also reducing the storage of Strings created from partially filled StringBuffers.

This would enhance a lot of places where StringBuffers should be kept at the estimated initial storage size: because conversion of StringBuffer to String currently makes the buffer to shared state, the StringBuffer needs to be reallocated while it could stay allocated; the String.setLength() method will not create a new char[] array because of a previous conversion of a partially filled StringBuffer to a String: we should keep the StringBuffer allocated until it is not referenced. Operations on StringBuffers would be faster and the VM would be less sollicitated.

Just checking if the length matches the capacity before sharing will not affect the performance of concatenations for which sharing was introduced.

2) Substrings for a string should not need to reallocate a copy of the string, if the substring indexes are (start=0, end=length()), instead it should return a shared string.


JUSTIFICATION :
Dead char[] objects are the most prevalent in the VM, most of them are created for temporary copies while also freeing the source string from which they are extracted. This stresses the GC, grows unnecessarily the total VM space on the host system, and creates more disk I/O for swap.
(Review ID: 182844) 
======================================================================