JDK-4616184 : java.net.URLEncoder.encode(String) doesn't encode to RFC2396 standard
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.net
  • Affected Version: 1.4.0
  • Priority: P4
  • Status: Closed
  • Resolution: Not an Issue
  • OS: generic
  • CPU: generic
  • Submitted: 2001-12-19
  • Updated: 2002-04-17
  • Resolved: 2002-04-17
Description

Name: nt126004			Date: 12/19/2001


java version "1.4.0-beta3"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-beta3-b84)
Java HotSpot(TM) Client VM (build 1.4.0-beta3-b84, mixed mode)

Using the URLEncoder class the other day, I noticed that 'space' is encoded
to '+' instead of '%20'.  Unless I'm reading the RFC wrong, shouldn't it be
encoded to '%20'?

The information is in section 2.4.1 and 2.4.3.
http://www.ietf.org/rfc/rfc2396.txt

Here's a comment from the java.net.URLEncoder class, lines 45 and 46, that
sites the source of the special characters as an O'reilly book:


    /* The list of characters that are not encoded have been determined by
       referencing O'Reilly's "HTML: The Definitive Guide" (page 164). */


  Interestingly enough, the 3rd edition of this text indicates that '+' should be
encoded to '%20' (p. 195).

Although you would probably want to remove other code in the class that
the 'space' to '+' substitution, the fix is as simple as commenting out line 60
of the class:

	//dontNeedEncoding.set(' '); /* encoding a space to a + is done in the
encode() method */


Here is a simple test program that shows this issue.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
import java.net.URLEncoder;

public class Test
{
   public static void main(String[] args)
   {
      System.out.println("Should see %20 here: " + URLEncoder.encode("space
here"));

   }
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
(Review ID: 137273) 
======================================================================

Comments
EVALUATION Actually, this is not a bug. This class implements the recommendations in the HTML specifications for how to encode URLs in HTML forms. It is not intended for other uses. This is specified in various places, including HTML 4.01 section 17.13.4, and also RFC 1866 (which is superceded by the W3C HTML recommendations). ###@###.### 2002-04-17
17-04-2002