FULL PRODUCT VERSION :
java version "1.5.0_04"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0
Java HotSpot(TM) Client VM (build 1.5.0_04-b05, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows XP [Version 5.1.2600]
A DESCRIPTION OF THE PROBLEM :
java.net.URLEncoder consumes a lot of CPU and memory. The main problem for this is that java.net.URLEncoder is not lazy. It always creates instances regardless it will use them or not. This has a great impact on URL fragments with no unsafe characters.
In the beginning of the method block I can see the following statements:
ByteArrayOutputStream buf = new ByteArrayOutputStream(maxBytesPerChar);
OutputStreamWriter writer = new OutputStreamWriter(buf, enc);
Why create them so early in the method? The actual need for them is if there are any characters to encode. Therefore it is better to create these two instances where they are needed.
Another problem is the following statement:
if (wroteUnencodedChar) { // Fix for 4407610
writer = new OutputStreamWriter(buf, enc);
Why can't we just reuse the first created instance?
Execution time in millis for 100 000 sequential encodings:
Encoding "Robert ���� ���� ����" "PLEASE_SUN_OPTIMIZE_THIS_CODE"
SUN Orginal 13547 3547
SUN Optimized 578 158
Apache commons 360 453
Memory usage for 10 000 (Not 100 000) sequential encodings:
Implementation "Robert ���� ���� ����" "PLEASE_SUN_OPTIMIZE_THIS_CODE"
SUN Orginal 348 322Kb 87 442Kb
SUN Optimized 90329Kb 1040Kb
Apache commons 3844Kb 4564Kb
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Just encode any sequence of characters in a benchmark:
long tStart = System.currentTimeMillis();
for (int idx = 0; idx < 100000; idx++) {
java.util.URLEncoder.encode("Robert Hoglund")
}
System.out.println("Time in millis : " + (System.currentTimeMillis() - tStart));
REPRODUCIBILITY :
This bug can be reproduced always.