JDK-4731779 : URLEncoder.encode() still really slow (provided optimization)
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.net
  • Affected Version: 1.4.0
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • OS: linux
  • CPU: x86
  • Submitted: 2002-08-14
  • Updated: 2002-08-15
  • Resolved: 2002-08-15
Related Reports
Duplicate :  
Description

Name: gm110360			Date: 08/14/2002


FULL PRODUCT VERSION :
java version "1.4.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-b92)
Java HotSpot(TM) Server VM (build 1.4.0-b92, mixed mode)


FULL OPERATING SYSTEM VERSION :
Linux mule 2.4.2-2 #1 Tue Aug 28 13:41:29 MDT 2001 i686
glibc-2.2.4-24


EXTRA RELEVANT SYSTEM CONFIGURATION :
Tests conducted on 733 Mhz P3 System.

A DESCRIPTION OF THE PROBLEM :
URLEncoder wasn't fast in 1.3 and is even slower in 1.4.  I
hereby offer your the provided source code, free of charge,
which speeds this up by a facter of up to 100x or so, while
still being fully compliant. In addition it offers Writer
style implementations which avoid uncessary String
construciton in some cases.




STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. javac NewURLEncoder.java test.java
2. java -server -cp . test < test.in


EXPECTED VERSUS ACTUAL BEHAVIOR :
Note ~100x perfomance improvements by second optimized pass.

% java -server -cp . test < test.in
JDK: 'nochange': 192.84?s
NEW: 'nochange': 7.9799999999999995?s
JDK: 'one+space': 141.2?s
NEW: 'one+space': 3.2?s
JDK: 'a+relatively+normal+string+with+spaces': 142.7?s
NEW: 'a+relatively+normal+string+with+spaces':
5.720000000000001?s
JDK: '%C3%A1+%C3%9Ftring+with+%C3%9CTF-8': 492.94?s
NEW: '%C3%A1+%C3%9Ftring+with+%C3%9CTF-8': 17.919999999999998?s
JDK: 'nochange': 139.96?s
NEW: 'nochange': 1.28?s
JDK: 'one+space': 138.7?s
NEW: 'one+space': 2.0?s
JDK: 'a+relatively+normal+string+with+spaces': 145.4?s
NEW: 'a+relatively+normal+string+with+spaces': 5.58?s
JDK: '%C3%A1+%C3%9Ftring+with+%C3%9CTF-8': 466.22?s
NEW: '%C3%A1+%C3%9Ftring+with+%C3%9CTF-8': 6.38?s


ERROR MESSAGES/STACK TRACES THAT OCCUR :
none


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
NewURLEncoder.java>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
import java.io.*;

public final class NewURLEncoder
{
    /**
     * URL encode input string 'in' and return encoded results as
     * String, encoding non-ASCII characters as UTF-8.
     */
    public static String encode( String in )
    {
        try {
            return encode( in, "UTF-8" );
        }
        catch( UnsupportedEncodingException x ) {
            // UTF-8 should always be an ok encoding
            throw new RuntimeException( x.toString() );
        }
    }

    /**
     * URL encode input string 'in' and return encoded results as
     * String, encoding non-ASCII characters with specified
     * 'encoding'.
     */
    public static String encode( String in, String encoding )
        throws UnsupportedEncodingException
    {
        try {
            StringWriter out = new StringWriter( in.length() * 2 );
            encode( in, out, encoding );
            return out.getBuffer().toString();
        }
        catch( UnsupportedEncodingException x ) {
            throw x;
        }
        catch( IOException x ) {
            // No other IO exceptions should be possible with
            // StringWriter.
            throw new RuntimeException( x.toString() );
        }
    }
    
    /**
     * URL encode input string 'in' to output writer 'out', encoding
     * non-ASCII characters as UTF-8.
     */
    public static void encode( String in, Writer out )
        throws IOException
    {
        encode( in, out, "UTF-8" );
    }

    /**
     * URL encode input string 'in' to output writer 'out', encoding
     * non-ASCII characters with specified 'encoding'.
     */
    public static void encode( String in, Writer out, String encoding )
        throws IOException, UnsupportedEncodingException
    {
        // Test if there are any characters outside of the ASCII 7-bit
        // range.  If this is the case then we need to convert to
        // encoding byte representation and encode this instead.
        int i = 0;
        int end = in.length();
        while( i < end ) {
            if( in.charAt(i) > 0x7F ) {
                encode( in.getBytes( encoding ), out );
                return;
            }
            ++i;
        }
        
        i = 0;
        int last = 0;
        char c = 0;
        while( i < end ) {

            c = in.charAt(i);

            if( ( ( c >= 'a' ) && ( c <= 'z' ) ) ||
                ( ( c >= 'A' ) && ( c <= 'Z' ) ) ||
                ( ( c >= '0' ) && ( c <= '9' ) ) ||
                ( c == '.' ) ||
                ( c == '-' ) ||
                ( c == '*' ) ||
                ( c == '_' ) ) {
                ++i;
            }
            else if( c == ' ' ) {
                if( last < i ) out.write( in, last, i - last );
                out.write( '+' );
                last = ++i;
            }
            else {
                if( last < i ) out.write( in, last, i - last );
                out.write( '%' );
                out.write( HEX_DIGITS[ c / 16 ] );
                out.write( HEX_DIGITS[ c % 16 ] );
                last = ++i;
            }
        }
        if( last < i ) out.write( in, last, i - last );
    }
    

    /**
     * URL Encode input byte[] 'in' (interpreted as a sequence of
     * potentially multi-byte characters) to output writer 'out'.
     */
    public static void encode( byte[] in, Writer out )
        throws IOException
    {
        int i = 0;
        int end = in.length;

        while( i < end ) {
            if( ( ( in[i] >= 'a' ) && ( in[i] <= 'z' ) ) ||
                ( ( in[i] >= 'A' ) && ( in[i] <= 'Z' ) ) ||
                ( ( in[i] >= '0' ) && ( in[i] <= '9' ) ) ||
                ( in[i] == '.' ) ||
                ( in[i] == '-' ) ||
                ( in[i] == '*' ) ||
                ( in[i] == '_' ) ) {
                out.write( (char) in[i] );
            }
            else if( in[i] == ' ' ) {
                out.write( '+' );
            }
            else {
                out.write( '%' );
                out.write( HEX_DIGITS[ ( in[i] & 0xff ) / 16 ] );
                out.write( HEX_DIGITS[ ( in[i] & 0xff ) % 16 ] );
            }
            ++i;
        }
    }
    
    private static final char[] HEX_DIGITS = "0123456789ABCDEF".toCharArray();
}

test.java>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
import java.io.*;

public class test
{
    public static int reps = 50000;
        
    public static void main( String[] args ) throws Exception
    {
        BufferedReader in
            = new BufferedReader( new InputStreamReader( System.in,
                                                         "ISO-8859-1" ) );
        String test = null;
        while( ( test = in.readLine() ) != null ) {
            testJDK( test );
            testNEW( test );
        }
    }

    public static void testJDK( String in ) throws Exception
    {
        long start = System.currentTimeMillis();
        int r = 0;
        String enc = null;
        while( r++ < reps ) {
            // 1.3: enc = java.net.URLEncoder.encode( in );
            // 1.4:
            enc = java.net.URLEncoder.encode( in, "UTF-8" );
        }
        long end = System.currentTimeMillis();
        double perEncode = ((double)( end - start )) / (double) reps
            * 1000.0d;
        System.out.println( "JDK: '" + enc + "': " + perEncode + "?s" );
    }

    public static void testNEW( String in ) throws Exception
    {
        long start = System.currentTimeMillis();
        int r = 0;
        String enc = null;
        while( r++ < reps ) {
            enc = NewURLEncoder.encode( in );
        }
        long end = System.currentTimeMillis();
        double perEncode = ((double)( end - start )) / (double) reps
            * 1000.0d;
        System.out.println( "NEW: '" + enc + "': " + perEncode + "?s" );
    }
}

test.in>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
nochange
one space
a relatively normal string with spaces
? ?tring with ?TF-8
nochange
one space
a relatively normal string with spaces
? ?tring with ?TF-8


---------- END SOURCE ----------

CUSTOMER WORKAROUND :
Use provided code instead of java.net.URLEncoder.encode()
(Review ID: 160736) 
======================================================================

Comments
EVALUATION Yes, encoder performance has degraded. Issue already on list for 1.4.2 as 4725737. ###@###.### 2002-08-15
15-08-2002