JDK-6368633 : java.net.URLEncoder.encode performance enhancements
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.net
  • Affected Version: 6
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2006-01-04
  • Updated: 2011-02-22
  • Resolved: 2006-11-02
Related Reports
Duplicate :  
Description
A DESCRIPTION OF THE REQUEST :
The encode method already contains an check that recreates the writer if needed - this can be used to delay creating the writer until it is first needed by initializing  "wroteUnencodedChar" as true.  Or even better to remove this variable and null the writer field.

Second: Character.forDigit() is used and the result is later adjusted with an call to isLetter(). This can be easier done with "0123456789ABCDEF".charAt(...) which would also be faster.

Also for the surugate checks - atleast the constants from Character should be used (like MIN_HIGH_SURROGATE).

JUSTIFICATION :
Currently this method looks like a hack and not a clean implementation.


CUSTOMER SUBMITTED WORKAROUND :
Improved and cleaned version:

    static final int maxBytesPerChar = 10; // rather arbitrary limit, but safe for now
            
    public static String encode(String s, String enc)
	throws UnsupportedEncodingException {

	boolean needToChange = false;
	StringBuffer out = new StringBuffer(s.length());
	ByteArrayOutputStream buf = new ByteArrayOutputStream(maxBytesPerChar);
	OutputStreamWriter writer = null;

	for (int i = 0; i < s.length(); i++) {
	    char c = s.charAt(i);
	    if (dontNeedEncoding.get((int)c)) {
		if (c == ' ') {
		    c = '+';
		    needToChange = true;
		}
		out.append((char)c);
                writer = null;
	    } else {
		// convert to external encoding before hex conversion
		try {
		    if (writer == null) { // Fix for 4407610
		    	writer = new OutputStreamWriter(buf, enc);
		    }
		    writer.write(c);
		    /*
		     * If this character represents the start of a Unicode
		     * surrogate pair, then pass in two characters. It's not
		     * clear what should be done if a bytes reserved in the
		     * surrogate pairs range occurs outside of a legal
		     * surrogate pair. For now, just treat it as if it were
		     * any other character.
		     */
		    if (Character.isHighSurrogate(c) && (i+1) < s.length()) {
                        char d = s.charAt(i+1);
                        if (Character.isLowSurrogate(d)) {
                            writer.write(d);
                            i++;
                        }
		    }
		    writer.flush();
		} catch (IOException e) {
		    buf.reset();
		    continue;
		}
		byte[] ba = buf.toByteArray();
		for (int j = 0; j < ba.length; j++) {
                    byte b = ba[j];
		    out.append('%');
                    out.append("0123456789ABCDEF".charAt((b >> 4) & 0xF));
                    out.append("0123456789ABCDEF".charAt((b     ) & 0xF));
		}
		buf.reset();
		needToChange = true;
	    }
	}
        return (needToChange? out.toString() : s);
    }