JDK-4513622 : (str) keeping a substring of a field prevents GC for object
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 1.4.0,5.0,6
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • OS: generic,windows_nt,windows_xp
  • CPU: generic,x86
  • Submitted: 2001-10-11
  • Updated: 2012-06-06
  • Resolved: 2012-06-06
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7
7u6Resolved
Related Reports
Duplicate :  
Duplicate :  
Duplicate :  
Description
Name: bsT130419			Date: 10/11/2001


Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-beta2-b77)
Java HotSpot(TM) Client VM (build 1.4.0-beta2-b77, mixed mode)

The following code produces an OutOfMemory error because
objects don't get garbage collected if the caller stores a substring
of a field in the object. Appears to occur in any JVM for WinNT.

public class TestGC {
    private String largeString = new String(new byte[100000]);
    private String smallString = "foo";
    
    String getString() {
        // if caller stores this substring, this object will not be gc'ed
        return this.largeString.substring(0,2);
//        return new String(this.largeString.substring(0,2)); // no error here!
//        return smallString; // no error here!
    }
    
    public static void main(String[] args) {
        java.util.ArrayList list = new java.util.ArrayList();
        for (int i = 0; i < 1000000; i++) {
            TestGC gc = new TestGC();
            list.add(gc.getString());
        }
    }
}
(Review ID: 133550) 
======================================================================
Suggested fix provided by java.net community member leouser:

A DESCRIPTION OF THE FIX :
BUG 4513622
JDK VERSION:  jdk-6-rc-bin-b64-linux-i586-15_dec_2005.bin
 
Attached is a conceivable solution to the problem that substring is not quite adequate for some people's needs and in fact can provide memory leak problems for it.  My solution is not the most radical of the possible ones but I think it is more usuable than just having javadoc comments talking about substrings implementation.

Additional Commentary( embeded in test as well ):
/**
 * BUG ID 4513622 --> keeping a substring of a field prevents GC for object.
 *
 * Rationale:
 * String.substring returns a new String but shares the underlying
 * char array.  This causes GC problems for clients that do not have
 * an understanding of this.  They will learn of this problem by:
 * 1. Reading the source
 * 2. Reading about it someplace in an article or posting
 * 3. Profiling a memory leak in their application --> this may be hard to find
 * In many instances they may never be aware that there is a problem.
 *
 * I became aware of it via #2.  But it appears that many others have
 * experienced it because of #3.
 *
 * This leads us to a conundrum:
 * 1. There are users who would prefer that substring returns a fresh string.
 * 2. There are users who would prefer that substring remains as it is.
 *
 * How can both needs be met?
 * SOLUTION 1: provide java doc that clarifies for the user how substring
 * creates a substring.  Also provide in the javadoc a quick idiom for getting
 * a fresh string:
 * String s = x.substring( 1 );
 * s = new String( s );
 *
 * SOLUTION 2: provide a different implementation that meets the efficiency
 * requirements of substring but also allows for GC of the parent string/char array
 *
 * SOLUTION 3: provide javadoc clarifying how substring works and providing
 * mirror methods that return a new substring.
 *
 * Of these I have chosen #3 to begin work.  My decision is currently based on:
 * a. Solution 2 may be too disruptive to existing String clients.
 * b. Solution 1 may not go far enough in providing utility to clients.  Developers
 * will want a quick and easy way to create substrings.  A 2 line idiom is too
 * burdensome for such a common need.
 * Of these 2, if Solution 3 is rejected, I would prefer #1.
 *
 * To implement solution 3 I have added to mirror methods:
 * newString( int beginIndex, int endIndex )
 * newString( int beginIndex )
 * and have clarified substring in the javadoc, pointing also to newString.
 *
 * I have choosen newString as the name because it is so distinct from substring.
 * Ideally substring shouldn't be called substring but something like:
 * subview
 * which would be more indicative of what it does.
 * newString has implementations that are essentially the same as substring( and
 * also the same javadoc ) but differ in how they produce their products.
 *
 * IMPACT:
 * At this juncture do to the final nature of String there should be no impact
 * upon existing clients, you cannot subclass String.  Anyone who does is using
 * some extralinguistic mechanism to acheive the subclass.
 *
 * NEGATIVES:
 * 1. 2 new methods add burden to the comprehension of the API
 * 2. javadoc now mentions the implementation of substring forever cementing
 * its implementation.
 *
 * I think though the negative consequences of being ignorant of what substring
 * does is more damaging than either NEGATIVES 1 or 2.  The developer will spend much longer fixing these problems in his code than the 2 - 5 minutes
 * it will take to learn these methods forever.  I recently read an article where
 * an application removed some severe memory leaks because of the substring
 * implementation.  If newString or even clear javadoc about its nature were
 * present the work involved in resolving these problems may never have had
 * to occur.
 *
 * Testing note:
 * This test tests against expected values and also if the equals method
 * returns true for a string derived from substring.  Why? newString is a
 * mirror of substring, except it returns a new String.  They should always
 * return true.
 *
 * String enhanced with java 6 version:
 * jdk-6-rc-bin-b64-linux-i586-15_dec_2005.bin
 *
 * test ran succesfully on a SUSE 7.3 Linux distribution
 *
 * Brian Harry
 * ###@###.###
 * jan 2, 2006
 */

UNIFIED DIFF:
--- /home/nstuff/java6/jdk1.6.0/java/lang/String.java	Thu Dec 15 02:16:40 2005
+++ /home/javarefs/strings/substring/java/lang/String.java	Mon Jan  2 15:30:22 2006
@@ -1733,6 +1733,12 @@
      * "emptiness".substring(9) returns "" (an empty string)
      * </pre></blockquote>
      *
+     * Important note: the substring returned by this method is
+     * a view of the internals of the instance.  This will prevent
+     * garbage collection of the instance if the returned string is
+     * held onto.  To get a substring that is not bound to the internals
+     * of the instance use newString( int beginIndex ) instead.
+     *
      * @param      beginIndex   the beginning index, inclusive.
      * @return     the specified substring.
      * @exception  IndexOutOfBoundsException  if
@@ -1754,6 +1760,13 @@
      * "hamburger".substring(4, 8) returns "urge"
      * "smiles".substring(1, 5) returns "mile"
      * </pre></blockquote>
+     *
+     * Important note: the substring returned by this method is
+     * a view of the internals of the instance.  This will prevent
+     * garbage collection of the instance if the returned string is
+     * held onto.  To get a substring that is not bound to the internals
+     * of the instance use newString( int beginIndex, int endIndex )
+     * instead.
      *
      * @param      beginIndex   the beginning index, inclusive.
      * @param      endIndex     the ending index, exclusive.
@@ -1809,6 +1822,76 @@
      */
     public CharSequence subSequence(int beginIndex, int endIndex) {
         return this.substring(beginIndex, endIndex);
+    }
+
+    /**
+     * Returns a new string that is a substring of this string. The
+     * substring begins at the specified <code>beginIndex</code> and
+     * extends to the character at index <code>endIndex - 1</code>.
+     * Thus the length of the substring is <code>endIndex-beginIndex</code>.
+     * <p>
+     * Examples:
+     * <blockquote><pre>
+     * "hamburger".newString(4, 8) returns "urge"
+     * "smiles".newString(1, 5) returns "mile"
+     * </pre></blockquote>
+     *
+     * IMPORTANT NOTE: newString needs to be distinguished from
+     * substring in that it returns an instance that is not connected
+     * to the instance from which is was derived.  It will not block
+     * garbage collection of the base instance.
+     *
+     * @param      beginIndex   the beginning index, inclusive.
+     * @param      endIndex     the ending index, exclusive.
+     * @return     the specified substring.
+     * @exception  IndexOutOfBoundsException  if the
+     *             <code>beginIndex</code> is negative, or
+     *             <code>endIndex</code> is larger than the length of
+     *             this <code>String</code> object, or
+     *             <code>beginIndex</code> is larger than
+     *             <code>endIndex</code>.
+     */
+    public String newString( int beginIndex, int endIndex ){
+
+	if (beginIndex < 0) {
+	    throw new StringIndexOutOfBoundsException(beginIndex);
+	}
+	if (endIndex > count) {
+	    throw new StringIndexOutOfBoundsException(endIndex);
+	}
+	if (beginIndex > endIndex) {
+	    throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
+	}
+	return new String( value, beginIndex, endIndex - beginIndex );
+
+    }
+
+    /**
+     * Returns a new string that is a substring of this string. The
+     * substring begins with the character at the specified index and
+     * extends to the end of this string. <p>
+     * Examples:
+     * <blockquote><pre>
+     * "unhappy".newString(2) returns "happy"
+     * "Harbison".newString(3) returns "bison"
+     * "emptiness".newString(9) returns "" (an empty string)
+     * </pre></blockquote>
+     *
+     * IMPORTANT NOTE: newString needs to be distinguished from
+     * substring in that it returns an instance that is not connected
+     * to the instance from which is was derived.  It will not block
+     * garbage collection of the base instance.
+     *
+     * @param      beginIndex   the beginning index, inclusive.
+     * @return     the specified substring.
+     * @exception  IndexOutOfBoundsException  if
+     *             <code>beginIndex</code> is negative or larger than the
+     *             length of this <code>String</code> object.
+     */
+    public String newString( int beginIndex ){
+
+	return newString( beginIndex, count );
+
     }
 
     /**


JUnit TESTCASE :
import junit.framework.TestCase;
import junit.textui.TestRunner;
import static java.lang.System.out;
import java.util.*;


/**
 * BUG ID 4513622 --> keeping a substring of a field prevents GC for object.
 *
 * Rationale:
 * String.substring returns a new String but shares the underlying
 * char array.  This causes GC problems for clients that do not have
 * an understanding of this.  They will learn of this problem by:
 * 1. Reading the source
 * 2. Reading about it someplace in an article or posting
 * 3. Profiling a memory leak in their application --> this may be hard to find
 * In many instances they may never be aware that there is a problem.
 *
 * I became aware of it via #2.  But it appears that many others have
 * experienced it because of #3.
 *
 * This leads us to a conundrum:
 * 1. There are users who would prefer that substring returns a fresh string.
 * 2. There are users who would prefer that substring remains as it is.
 *
 * How can both needs be met?
 * SOLUTION 1: provide java doc that clarifies for the user how substring
 * creates a substring.  Also provide in the javadoc a quick idiom for getting
 * a fresh string:
 * String s = x.substring( 1 );
 * s = new String( s );
 *
 * SOLUTION 2: provide a different implementation that meets the efficiency
 * requirements of substring but also allows for GC of the parent string/char array
 *
 * SOLUTION 3: provide javadoc clarifying how substring works and providing
 * mirror methods that return a new substring.
 *
 * Of these I have chosen #3 to begin work.  My decision is currently based on:
 * a. Solution 2 may be too disruptive to existing String clients.
 * b. Solution 1 may not go far enough in providing utility to clients.  Developers
 * will want a quick and easy way to create substrings.  A 2 line idiom is too
 * burdensome for such a common need.
 * Of these 2, if Solution 3 is rejected, I would prefer #1.
 *
 * To implement solution 3 I have added to mirror methods:
 * newString( int beginIndex, int endIndex )
 * newString( int beginIndex )
 * and have clarified substring in the javadoc, pointing also to newString.
 *
 * I have choosen newString as the name because it is so distinct from substring.
 * Ideally substring shouldn't be called substring but something like:
 * subview
 * which would be more indicative of what it does.
 * newString has implementations that are essentially the same as substring( and
 * also the same javadoc ) but differ in how they produce their products.
 *
 * IMPACT:
 * At this juncture do to the final nature of String there should be no impact
 * upon existing clients, you cannot subclass String.  Anyone who does is using
 * some extralinguistic mechanism to acheive the subclass.
 *
 * NEGATIVES:
 * 1. 2 new methods add burden to the comprehension of the API
 * 2. javadoc now mentions the implementation of substring forever cementing
 * its implementation.
 *
 * I think though the negative consequences of being ignorant of what substring
 * does is more damaging than either NEGATIVES 1 or 2.  The developer will spend much longer fixing these problems in his code than the 2 - 5 minutes
 * it will take to learn these methods forever.  I recently read an article where
 * an application removed some severe memory leaks because of the substring
 * implementation.  If newString or even clear javadoc about its nature were
 * present the work involved in resolving these problems may never have had
 * to occur.
 *
 * Testing note:
 * This test tests against expected values and also if the equals method
 * returns true for a string derived from substring.  Why? newString is a
 * mirror of substring, except it returns a new String.  They should always
 * return true.
 *
 * String enhanced with java 6 version:
 * jdk-6-rc-bin-b64-linux-i586-15_dec_2005.bin
 *
 * test ran succesfully on a SUSE 7.3 Linux distribution
 *
 * Brian Harry
 * ###@###.###
 * jan 2, 2006
 */
public class StringNewString extends TestCase{


    public StringNewString( String test ){

	super( test );
    
    }

    public void testNewString(){
	
	out.println( "" );
	String s = "Java is cool";
	String test1 = "is cool";
	String test2 = "Java is";
	String test3 = "va is cool";
	String test4 = "is";
	
	out.println( "TEST 1" );
	String nstring1 = s.newString( 5 );
	out.println( nstring1 );
	assertEquals( nstring1, test1 );
	assertEquals( s.substring( 5 ), nstring1 );
	out.println( "TEST 2" );
	String nstring2 = s.newString( 0, 7 );
	out.println( nstring2 );
	assertEquals( nstring2, test2 );
	assertEquals( s.substring( 0, 7 ), nstring2 );
	out.println( "TEST 3" );
	String nstring3 = s.newString( 2 );
	out.println( nstring3 );
	assertEquals( nstring3, test3 );
	assertEquals( s.substring( 2 ), nstring3 );
	out.println( "TEST 4" );
	String nstring4 = s.newString( 5, 7 );
	out.println( nstring4 );
	assertEquals( nstring4, test4 );
	assertEquals( s.substring( 5,7), nstring4 );

	//these should fail
	try{
	    
	    out.println( "TEST 5 should fail" );
	    String nstring5 = s.newString( -1, -1 );
	    assertEquals( nstring5, s );

	}
	catch( Exception x ){

	    out.println( "Failure 1 as expected" );

	}

	try{
	    out.println( "TEST 6 should fail, report 1 error" );
	    String nstring6 = s.newString( 7, 8 );
	    assertEquals( nstring6, test4 );

	}
	catch( Exception x ){

	    out.println( "Failure 2 as expected" );

	}

    }

    public static void main( String[] args ){

	out.println( "Testing with testNewString" );
	TestRunner.run( new StringNewString( "testNewString" ) );

    }


}


FIX FOR BUG NUMBER:
4513622
Given the history of this bug, the solution with the least impact on existing code, test, performance, etc. is to simply change the Javadoc to indicate that substring does not return a new String and to show the workaround as suggested.
I retract my suggstions.  It turns out that as of 6924259 this issue is closable as fixed  for both 7u6 and 8. The char buffers are no longer shared unless the strings are identical.

Comments
EVALUATION CR#6924259 addresses this issue as substrings no longer share character arrays with their source.
06-06-2012

EVALUATION Contribution-Forum:https://jdk-collaboration.dev.java.net/servlets/ProjectForumMessageView?forumID=1463&messageID=10695
03-01-2006

EVALUATION When you call String.substring as in the example, a new character array for storage is not allocated. It uses the character array of the original String. Thus, the character array backing the the original String can not be GC'd until the substring's references can also be GC'd. This is an intentional optimization to prevent excessive allocations when using substring in common scenarios. Unfortunately, the problematic code hits a case where the overhead of the original array is noticeable. It is difficult to optimize for both edges cases. Any optimization for space/size trade-offs are generally complex and can often be platform-specific. It's too late to fix this for Mustang and it's entirely possible that the bug will not be fixed in future releases either. Such a change would require early and extensive testing in the release cycle because it is likely to introduce preformance regressions in other programs. For the case described in the example, it is possible for the user to work-around the problem by explicitly calling "new".
19-09-2005

WORK AROUND Name: bsT130419 Date: 10/11/2001 instead of foo = bar.substring(0,x) use foo = new String(bar.substring(0,x)) but: substring is supposed to return a new String anyway! ======================================================================
21-08-2004