United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-4257115 URLEncoder and URLDecoder should support target character sets
JDK-4257115 : URLEncoder and URLDecoder should support target character sets

Details
Type:
Bug
Submit Date:
1999-07-26
Status:
Resolved
Updated Date:
2000-06-12
Project Name:
JDK
Resolved Date:
2000-04-28
Component:
core-libs
OS:
solaris_2.5,solaris_2.6,generic
Sub-Component:
java.net
CPU:
sparc,generic
Priority:
P4
Resolution:
Fixed
Affected Versions:
1.0,1.2.0,1.2.1,1.2.2,1.2.2_08,1.4.0
Fixed Versions:
1.4.0 (beta)

Related Reports
Duplicate:
Duplicate:
Duplicate:
Duplicate:
Duplicate:
Duplicate:
Duplicate:
Relates:
Relates:

Sub Tasks

Description
The java.net.URLEncoder and java.net.URLDecoder classes currently
always encode and decode non-ascii characters using the platform's
default character set encoding.

These two classes should be extended to allow encoding and decoding to
and from specific character set encoding formats.  The default character
set should be UTF8.



                                    

Comments
WORK AROUND



Name: clC74495			Date: 07/26/99


to use own implementation.
======================================================================
                                     
2004-06-11
EVALUATION

The customer's suggestion sounds reasonable, especially plan 2. But I don't know whether it can make into Kestrel, because it requires some new APIs.

yingxian.wang@eng 1999-07-28


mayank.upadhyay@eng 2000-04-12

Plan (1) should be implemented. The only case in which has a potential for a backwards compatibility issue is when using non-ASCII characters (> 1 byte) in the form data on a platform with a non-ASCII default encoding. That case should not be working too well under the current implementation and as this particular customer indicated these classes are useless to them. Moreover, the current API specification clearly (albeit incorrectly) says that only 8 bytes are used in the encoding.

mayank.upadhyay@eng 2000-04-24

An analysis of the backwards compatibility with regards to the
behaviour of the current URLEncoder/URLDecoder API is as follows:

   (i) On platforms that use ASCII as the default encoding:

       - URI's with ASCII characters will work just fine since 
       the behaviour for those characters will be unchanged under 
       UTF-8.
       
       - URI's with non-ASCII characters (> 1byte) do not work at 
       present so there is no backwards compatibility issue.
       
   (ii) On platforms that use a non-ASCII encoding as the default, eg 
   Unicode or some two byte japanese encoding:

       - URI's with ASCII characters in them are correctly encoded by 
       URLEncoder but *incorrectly* decoded by URLDecoder at
       present. The correct encoding of ASCII characters will remain
       unchanged while the decoding will start working.

       - URI's with non-ASCII characters are encoded by 
       URLEncoder with a non-standard method where the default 
       platform encoding is used to obtain the bytes for the
       %xy escaping. Currently, the URLDecoder class decodes
       an encoded string into the original string IFF the default
       encoding on the peer side is the same as that used by
       URLEncoder. The default behaviour for this case will change
       such that the W3C recommended UTF-8 based encoding will be
       used at both ends instead of the respective default platform
       encodings.

       We don't expect there to be a significant number of
       applications, if at all any, that rely on this behaviour
       since:

         (a) it requires that both network peers be configured to use
         the same default encoding which is a questionable assumption
         to make on the web.

         (b) Most older user agents use the document encoding
         scheme, not the default platform encoding scheme; hence they
         have been unable to use the URLEncoder/URLDecoder classes
         till now for non-ASCII characters.

         (c) Both URLEncoder and URLDecoder API specs incorrectly
         state that only the least significant byte will be considered
         for the %xy escaping hence non-ASCII users might have been
         discouraged from using these API's in the past.

                                     
2004-06-11
CONVERTED DATA

BugTraq+ Release Management Values

COMMIT TO FIX:
merlin

FIXED IN:
merlin-beta

INTEGRATED IN:
merlin-beta


                                     
2004-06-14



Hardware and Software, Engineered to Work Together