JDK-6415062 : 30 MB memory trashed to get 30 kb string url encoded
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.net
  • Affected Version: 1.4.2,6
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2006-04-19
  • Updated: 2011-02-16
  • Resolved: 2006-05-13
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other JDK 6
1.4.2_15Fixed 6 b85Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Description
FULL PRODUCT VERSION :
java version "1.4.2_07"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_07-b05)
Java HotSpot(TM) Client VM (build 1.4.2_07-b05, mixed mode)

but also higher and lower.

ADDITIONAL OS VERSION INFORMATION :
on all OS platforms

EXTRA RELEVANT SYSTEM CONFIGURATION :
standalone java example

A DESCRIPTION OF THE PROBLEM :
example, which performs
java.net.URLEncoder.encode(String ..) in a main method with the string parameter a string which is 30 kb in size.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Just run the code, which we have rovided in the "Source code for an executable test case:" point of this questionary.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
We expect that the trashed memory does not exceed the size of the encoded string more than 3 times.
The URL encoding is widely used in the webdynpro and portal layers of SAP products, therefore we expect fix of the problem with very high priority.
ACTUAL -
The actual memory consumption for different tested string lengths is
The heading for the columns below are - 
1) Length of string (number of characters) 
2) Length of string which is encoded/decoded in kb 
3) Processing memory (kb) ENCODING 
4) Processing memory (kb) DECODING 

10	0.009	18	1.16
50	0.048	52	2.96
100	0.09	103	4.44
200	0.19	189	7.94
300	0.29	275	12
400	0.39	360	16
500	0.48	455	20

where decoding we would say is fine, but encoding is real problem

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
public class URLTest {

  public static void main(String[] args) {
  
      boolean encode = true;
      if (args[0].equals("-encode")) {
          encode = true;
      } else if (args[0].equals("-decode")) {
         encode = false;
      }
      
      int iterations = Integer.parseInt(args[1]);
      String teststring = args[2];
      
      System.out.println(teststring);
      System.out.println("Test string length is " + teststring.length() + " bytes.");
      if (encode) {
          System.out.println("Will be encoded " + iterations + " times.");
      } else {
          System.out.println("Will be decoded " + iterations + " times.");
      }
      try {
	   Thread.sleep(30000);
	} catch (InterruptedException iio) {
	}
      Runtime.getRuntime().gc();
      for (int i = 0; i < iterations; i++) {
	if (encode) {
		URLEncoder.encode(teststring);
	} else {
	      URLDecoder.decode(teststring);
	}
	try {
	   Thread.sleep(30000);
	} catch (InterruptedException iio) {
	}
      }
      Runtime.getRuntime().gc();
        
  }
}


During the Thread.sleep we attach and detach profiler and make the memory snapshot, or when we run w/o profiler we check the gclog file - both approached confirm the high memory allocations.
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
there is not workaround.

Comments
EVALUATION The reason OutputStreamWrieter is so inefficient is that it creates a sun.nio.cs.StreamEncoder object which allocates an 8K byte buffer for each StreamEncoder. The fix is to use a reusable CharArrayWriter to write sequence of chars, and String.getBytes to do the conversion. With the above changes the amount of memory used for Strings that need a lot of encoding is greatly reduced.
02-05-2006

EVALUATION URLEncoder is inefficient as it creates a new OutputStreamWriter object for each set of consecutive characters that need encoding. This is the case because some encodings like UTF-16 write a pair of bytes, %FE%FF, at the begining of each encoding sequence. A new instance of OutputStreamWriter does this and this is what the URLDecoder is expecting, see 4407610.
19-04-2006