Bug ID: JDK-4257115 URLEncoder and URLDecoder should support target character sets

Type: Bug
Component: core-libs
Sub-Component: java.net
Affected Version:
1.0,1.2.0,1.2.1,1.2.2,1.2.2_08,1.4.0 1.0,1.2.0,1.2.1,1.2.2,1.2.2_08,1.4.0

Priority: P4
Status: Resolved
Resolution: Fixed
OS: generic,solaris_2.5,solaris_2.6
CPU: generic,sparc

Submitted: 1999-07-26
Updated: 2000-06-12
Resolved: 2000-04-28

Other
1.4.0 betaFixed

The java.net.URLEncoder and java.net.URLDecoder classes currently
always encode and decode non-ascii characters using the platform's
default character set encoding.

These two classes should be extended to allow encoding and decoding to
and from specific character set encoding formats.  The default character
set should be UTF8.

CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: merlin FIXED IN: merlin-beta INTEGRATED IN: merlin-beta

14-06-2004

EVALUATION The customer's suggestion sounds reasonable, especially plan 2. But I don't know whether it can make into Kestrel, because it requires some new APIs. yingxian.wang@eng 1999-07-28 mayank.upadhyay@eng 2000-04-12 Plan (1) should be implemented. The only case in which has a potential for a backwards compatibility issue is when using non-ASCII characters (> 1 byte) in the form data on a platform with a non-ASCII default encoding. That case should not be working too well under the current implementation and as this particular customer indicated these classes are useless to them. Moreover, the current API specification clearly (albeit incorrectly) says that only 8 bytes are used in the encoding. mayank.upadhyay@eng 2000-04-24 An analysis of the backwards compatibility with regards to the behaviour of the current URLEncoder/URLDecoder API is as follows: (i) On platforms that use ASCII as the default encoding: - URI's with ASCII characters will work just fine since the behaviour for those characters will be unchanged under UTF-8. - URI's with non-ASCII characters (> 1byte) do not work at present so there is no backwards compatibility issue. (ii) On platforms that use a non-ASCII encoding as the default, eg Unicode or some two byte japanese encoding: - URI's with ASCII characters in them are correctly encoded by URLEncoder but *incorrectly* decoded by URLDecoder at present. The correct encoding of ASCII characters will remain unchanged while the decoding will start working. - URI's with non-ASCII characters are encoded by URLEncoder with a non-standard method where the default platform encoding is used to obtain the bytes for the %xy escaping. Currently, the URLDecoder class decodes an encoded string into the original string IFF the default encoding on the peer side is the same as that used by URLEncoder. The default behaviour for this case will change such that the W3C recommended UTF-8 based encoding will be used at both ends instead of the respective default platform encodings. We don't expect there to be a significant number of applications, if at all any, that rely on this behaviour since: (a) it requires that both network peers be configured to use the same default encoding which is a questionable assumption to make on the web. (b) Most older user agents use the document encoding scheme, not the default platform encoding scheme; hence they have been unable to use the URLEncoder/URLDecoder classes till now for non-ASCII characters. (c) Both URLEncoder and URLDecoder API specs incorrectly state that only the least significant byte will be considered for the %xy escaping hence non-ASCII users might have been discouraged from using these API's in the past.

11-06-2004

WORK AROUND Name: clC74495 Date: 07/26/99 to use own implementation. ======================================================================

11-06-2004

Duplicate :	JDK-4239597 - java.net.URLDecode does not handle double byte characters
Duplicate :	JDK-4238263 - URLEncoder specification incorrect about encoding algorithm
Duplicate :	JDK-4488606 - URLDecoder.decode() should not do charset conversion
Duplicate :	JDK-4146652 - java.net.URLEncoder.java needs to support more forms of encode method
Duplicate :	JDK-4304014 - Extend URLEncoder
Duplicate :	JDK-4449371 - URLEncoder and URLDecoder requires encoding support other than the default one
Duplicate :	JDK-4488596 - URLEncoder.encode() does not always take the low 8 bits of character
Relates :	JDK-4206144 - Request more precise specification of java.net.URLEncoder.encode
Relates :	JDK-4316925 - URLEncoder.encode(String) works wrong when external encoding is set