JDK-6294060 : Use of substring() causes memory leak
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 5.0
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2005-07-05
  • Updated: 2011-02-16
  • Resolved: 2005-07-26
Related Reports
Duplicate :  
Description
FULL PRODUCT VERSION :
java version "1.5.0_02"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_02-b09)
Java HotSpot(TM) Client VM (build 1.5.0_02-b09, mixed mode, sharing)

ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows XP [Version 5.1.2600]

A DESCRIPTION OF THE PROBLEM :
The bug with ID 4637640 though marked as fixed for JDK 1.5 is still present in the JDK 1.5 releases.

By the way bug 4637640 was never a duplicate of bug 4546734! They're only partially related and maybe  4546734 was fixed but 4637640 is still there in JDK 1.5 release.

Since this is not a functional bug is not possible to provide a unit 
test. It's a memnory leaking behaviour which can only be observed using 
a profiler (I used the new Netbeans Profiler) or similar memory 
monitoring tools.
I my opinion everything necessary is said in bug report  4637640: the 
expected behaviour of substring should be that if I use 
String.substring() the memory of the String (or better the underlying 
char[]) should be released if I there are no references left to the 
original String object. The actual behaviour is that substring() uses an 
performance optimized package private constructor which does not perform 
a copy of the char array for substring but reuses the original char 
array and so the memory will not be released as long a reference to the 
extracted substring exists.

The behaviour can even be "cloaked" by using java.util.regex classes: 
Matcher.group() etc. which seem to use substring internally for String 
extraction. This way it's nearly impossible to detect the memory leak.

This behaviour is totally independent of OS, Java Version, VM type etc. 
It's an error in java.lang.String source code which is the same for all 
JDK distributions.

If the performance optimizations should still be used in furture 
documentation must be added to String class so users of substring will 
be warned at least.

I have created a more simple example class which demonstrates this 
behaviour very clearly.


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
See bug ID 4637640

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
See bug ID 4637640

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
/*
 * MemoryLeak.java
 *
 * Created on 24. April 2005, 19:03
 *
 * To change this template, choose Tools | Options and locate the template under
 * the Source Creation and Management node. Right-click the template and choose
 * Open. You can then make changes to the template in the Source Editor.
 */

package de.gisdesign.scratchbook;

import java.util.ArrayList;
import java.util.List;

/**
 *
 * @author pasekdbh
 */
public class MemoryLeak {

    private static final int NUMBER_OF_BUFFERS = 15;
    
    private static final int BUFFER_SIZE = 256 * 1024;

    private static final String DUMMY_TEXT = "Hello world! This is a memory leak!";
    
    private String longString;
    
    private int size;
    
    /** Creates a new instance of MemoryLeak */
    public MemoryLeak() {
        StringBuffer sb = new StringBuffer(BUFFER_SIZE);
        int count = (BUFFER_SIZE / DUMMY_TEXT.length());
        for (int i = 0; i < count; i++)  {
            sb.append(DUMMY_TEXT);
            this.size += DUMMY_TEXT.length();
        }
        this.longString = sb.toString();
    }
    
    public String getSubString()  {
        double rand1 = Math.random();
        int begin = (int)Math.round((this.size - 10) * rand1);
        int end = begin + 8;
        return this.longString.substring(begin, end);
    }
    
    public static void main(String[] args)  {
        List subStrings = new ArrayList(NUMBER_OF_BUFFERS);
        for (int i = 0; i < NUMBER_OF_BUFFERS; i++)  {
            MemoryLeak leak = new MemoryLeak();

            //This call creates memory leaking
            String subString = leak.getSubString();
            
            //This call avoids memory leaking
            //String subString = new String(leak.getSubString());
            
            System.out.println("Extracted substring: " + subString);
            subStrings.add(subString);
        }
        //No release of buffer objects!!
        System.out.println("Keeping the substrings means keeping the whole buffer!");
        for (int i = 0; i < NUMBER_OF_BUFFERS; i++)  {
            System.out.println("List of subStrings: " + subStrings);
            try {
                Thread.sleep(5000);
            } catch (InterruptedException ex)  {
                System.exit(1);
            }
            System.gc();
        }
        //Releasing substring triggers release of buffer objects!!
        System.out.println("Only releasing the substrings releases the whole buffer!");
        for (int i = 0; i < NUMBER_OF_BUFFERS; i++)  {
            System.out.println("List of subStrings: " + subStrings);
            subStrings.remove(0);
            try {
                Thread.sleep(5000);
            } catch (InterruptedException ex)  {
                System.exit(1);
            }
            System.gc();
        }
        System.exit(0);
    }
    
}

import java.util.Arrays;

/**
 *
 * @author Denis Pasek
 */
public class MemoryLeak2 {

    private static final int BUFFER_SIZE = 10 * 1024 * 1024;

    private static final char DUMMY_CHAR = 'a';   
   
    public static void main(String[] args)  {
       
        //Create dummy char array
        char[] bigCharArray = new char[BUFFER_SIZE];
        Arrays.fill(bigCharArray, DUMMY_CHAR);

        //Create String from char array, release dummy char array
        String longString = new String(bigCharArray);
        bigCharArray = null;
       
        //Extract first char of long String and release long String
        String shortString = longString.substring(0,1);
        longString = null;

        //Perform GC and wait for further automatic GCs
        System.gc();
        try {
            Thread.sleep(30000);
            System.gc();
        } catch (InterruptedException ignored) {           
        }       

        //Memory of longString is not released!!
        System.out.println("Memory of long string is not released!");
       
        //Release the shot String reference and perform GC
        shortString = null;
        System.gc();

        //Memory of longString will be released!!       
        System.out.println("Memory of long string will be released!");
        try {
            Thread.sleep(10000);
        } catch (InterruptedException ignored) {           
        }       
        System.out.println("Memory of long string should be released!");
       
        System.exit(0);
    }
   
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Always use new String(string.substring(x,y)); when extracting substrings from huge strings or buffers.
###@###.### 2005-07-05 08:53:24 GMT

Comments
EVALUATION substring() calls the following constructor: // Package private constructor which shares value array for speed. String(int offset, int count, char value[]) which, unfortunately, still uses the original string content (value[]), which is very huge in this test case. ###@###.### 2005-07-07 08:55:00 GMT
07-07-2005