United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6415062 30 MB memory trashed to get 30 kb string url encoded
JDK-6415062 : 30 MB memory trashed to get 30 kb string url encoded

Details
Type:
Bug
Submit Date:
2006-04-19
Status:
Resolved
Updated Date:
2011-02-16
Project Name:
JDK
Resolved Date:
2006-05-13
Component:
core-libs
OS:
windows_xp
Sub-Component:
java.net
CPU:
x86
Priority:
P4
Resolution:
Fixed
Affected Versions:
1.4.2,6
Fixed Versions:

Related Reports
Backport:
Backport:
Duplicate:
Relates:
Relates:
Relates:

Sub Tasks

Description
FULL PRODUCT VERSION :
java version "1.4.2_07"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_07-b05)
Java HotSpot(TM) Client VM (build 1.4.2_07-b05, mixed mode)

but also higher and lower.

ADDITIONAL OS VERSION INFORMATION :
on all OS platforms

EXTRA RELEVANT SYSTEM CONFIGURATION :
standalone java example

A DESCRIPTION OF THE PROBLEM :
example, which performs
java.net.URLEncoder.encode(String ..) in a main method with the string parameter a string which is 30 kb in size.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Just run the code, which we have rovided in the "Source code for an executable test case:" point of this questionary.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
We expect that the trashed memory does not exceed the size of the encoded string more than 3 times.
The URL encoding is widely used in the webdynpro and portal layers of SAP products, therefore we expect fix of the problem with very high priority.
ACTUAL -
The actual memory consumption for different tested string lengths is
The heading for the columns below are - 
1) Length of string (number of characters) 
2) Length of string which is encoded/decoded in kb 
3) Processing memory (kb) ENCODING 
4) Processing memory (kb) DECODING 

10	0.009	18	1.16
50	0.048	52	2.96
100	0.09	103	4.44
200	0.19	189	7.94
300	0.29	275	12
400	0.39	360	16
500	0.48	455	20

where decoding we would say is fine, but encoding is real problem

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
public class URLTest {

  public static void main(String[] args) {
  
      boolean encode = true;
      if (args[0].equals("-encode")) {
          encode = true;
      } else if (args[0].equals("-decode")) {
         encode = false;
      }
      
      int iterations = Integer.parseInt(args[1]);
      String teststring = args[2];
      
      System.out.println(teststring);
      System.out.println("Test string length is " + teststring.length() + " bytes.");
      if (encode) {
          System.out.println("Will be encoded " + iterations + " times.");
      } else {
          System.out.println("Will be decoded " + iterations + " times.");
      }
      try {
	   Thread.sleep(30000);
	} catch (InterruptedException iio) {
	}
      Runtime.getRuntime().gc();
      for (int i = 0; i < iterations; i++) {
	if (encode) {
		URLEncoder.encode(teststring);
	} else {
	      URLDecoder.decode(teststring);
	}
	try {
	   Thread.sleep(30000);
	} catch (InterruptedException iio) {
	}
      }
      Runtime.getRuntime().gc();
        
  }
}


During the Thread.sleep we attach and detach profiler and make the memory snapshot, or when we run w/o profiler we check the gclog file - both approached confirm the high memory allocations.
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
there is not workaround.

                                    

Comments
EVALUATION

URLEncoder is inefficient as it creates a new OutputStreamWriter object for each set of consecutive characters that need encoding. This is the case because some encodings like UTF-16 write a pair of bytes, %FE%FF, at the begining of each encoding sequence. A new instance of OutputStreamWriter does this and this is what the URLDecoder is expecting, see 4407610.
                                     
2006-04-19
EVALUATION

The reason OutputStreamWrieter is so inefficient is that it creates a sun.nio.cs.StreamEncoder object which allocates an 8K byte buffer for each StreamEncoder. The fix is to use a reusable CharArrayWriter to write sequence of chars, and String.getBytes to do the conversion. 

With the above changes the amount of memory used for Strings that need a lot of encoding is greatly reduced.
                                     
2006-05-02



Hardware and Software, Engineered to Work Together