Bug ID: JDK-6351680 java.net.URLEncoder consumes a lot of CPU and memory.

Type: Bug
Component: core-libs
Sub-Component: java.net
Affected Version: 5.0

Priority: P4
Status: Closed
Resolution: Not an Issue
OS: windows_xp
CPU: x86

Submitted: 2005-11-17
Updated: 2010-04-02
Resolved: 2006-04-03

FULL PRODUCT VERSION :
java version "1.5.0_04"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0
Java HotSpot(TM) Client VM (build 1.5.0_04-b05, mixed mode)

ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows XP [Version 5.1.2600]

A DESCRIPTION OF THE PROBLEM :
java.net.URLEncoder consumes a lot of CPU and memory. The main problem for this is that java.net.URLEncoder is not lazy. It always creates instances regardless it will use them or not. This has a great impact on URL fragments with no unsafe characters.

In the beginning of the method block I can see the following statements:
ByteArrayOutputStream buf = new ByteArrayOutputStream(maxBytesPerChar);
OutputStreamWriter writer = new OutputStreamWriter(buf, enc);
Why create them so early in the method? The actual need for them is if there are any characters to encode. Therefore it is better to create these two instances where they are needed.

Another problem is the following statement:
 if (wroteUnencodedChar) { // Fix for 4407610
writer = new OutputStreamWriter(buf, enc);
Why can't we just reuse the first created instance?


Execution time in millis for 100 000 sequential encodings:

Encoding                 "Robert  ���� ���� ����"	"PLEASE_SUN_OPTIMIZE_THIS_CODE"
SUN Orginal                  13547	3547
SUN Optimized	578 	158
Apache commons	360 	453

Memory usage for 10 000 (Not 100 000) sequential encodings:

Implementation	"Robert ���� ���� ����"	"PLEASE_SUN_OPTIMIZE_THIS_CODE"
SUN Orginal	348 322Kb	87 442Kb
SUN Optimized	90329Kb	1040Kb
Apache commons	3844Kb	4564Kb


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Just encode any sequence of characters in a benchmark:
long tStart = System.currentTimeMillis();
for (int idx = 0; idx < 100000; idx++) {
java.util.URLEncoder.encode("Robert Hoglund")
}
System.out.println("Time in millis : " + (System.currentTimeMillis() - tStart));



REPRODUCIBILITY :
This bug can be reproduced always.

EVALUATION As was pointed out by ###@###.###, avoiding creating the OutputStreamWriter will actually deviate from the API if the character encoding supplied is invalid: we must throw UnsupportedEncodingException in that case. That makes the possible performance gain even more marginal. It can only look good in the microbenchmarks where the routine is called many thousands of times. A second issue is that some of the time can be saved by tuning the heap: we see in the testcase the heap sometimes expands to nearly 200mb, then shrinks after a full GC to under 100mb. A larger new generation avoids such expansion. This bug as stated appears to not be where the customer's real problem lies, so we plan to close it. *** (#1 of 1): [ UNSAVED ] ###@###.###

03-04-2006

SUGGESTED FIX Removed the "eager" creation of those objects, added a condition for checking whether buf needed to be created - once only, and only if we have any encoding work to do. Plus, if we are encoding, there already exists the variable wroteUnencodedChar to show if we are starting a new group of encoded characters: I made it also create a new OutputStreamWriter if we are encoding, and we are on the first character. So that seems to avoid any unnecessary object creation, and in some tests it's convincing... Webrev: http://cheesypoof.uk.sun.com/kevinwa/net/zonda.ireland/zonda/kevin/1.5.0_6351680/j2se/webrev/

17-03-2006