JDK-4257115 : URLEncoder and URLDecoder should support target character sets
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.net
  • Affected Version:
    1.0,1.2.0,1.2.1,1.2.2,1.2.2_08,1.4.0 1.0,1.2.0,1.2.1,1.2.2,1.2.2_08,1.4.0
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic,solaris_2.5,solaris_2.6
  • CPU: generic,sparc
  • Submitted: 1999-07-26
  • Updated: 2000-06-12
  • Resolved: 2000-04-28
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
1.4.0 betaFixed
Related Reports
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Description
The java.net.URLEncoder and java.net.URLDecoder classes currently
always encode and decode non-ascii characters using the platform's
default character set encoding.

These two classes should be extended to allow encoding and decoding to
and from specific character set encoding formats.  The default character
set should be UTF8.



Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: merlin FIXED IN: merlin-beta INTEGRATED IN: merlin-beta
14-06-2004

EVALUATION The customer's suggestion sounds reasonable, especially plan 2. But I don't know whether it can make into Kestrel, because it requires some new APIs. yingxian.wang@eng 1999-07-28 mayank.upadhyay@eng 2000-04-12 Plan (1) should be implemented. The only case in which has a potential for a backwards compatibility issue is when using non-ASCII characters (> 1 byte) in the form data on a platform with a non-ASCII default encoding. That case should not be working too well under the current implementation and as this particular customer indicated these classes are useless to them. Moreover, the current API specification clearly (albeit incorrectly) says that only 8 bytes are used in the encoding. mayank.upadhyay@eng 2000-04-24 An analysis of the backwards compatibility with regards to the behaviour of the current URLEncoder/URLDecoder API is as follows: (i) On platforms that use ASCII as the default encoding: - URI's with ASCII characters will work just fine since the behaviour for those characters will be unchanged under UTF-8. - URI's with non-ASCII characters (> 1byte) do not work at present so there is no backwards compatibility issue. (ii) On platforms that use a non-ASCII encoding as the default, eg Unicode or some two byte japanese encoding: - URI's with ASCII characters in them are correctly encoded by URLEncoder but *incorrectly* decoded by URLDecoder at present. The correct encoding of ASCII characters will remain unchanged while the decoding will start working. - URI's with non-ASCII characters are encoded by URLEncoder with a non-standard method where the default platform encoding is used to obtain the bytes for the %xy escaping. Currently, the URLDecoder class decodes an encoded string into the original string IFF the default encoding on the peer side is the same as that used by URLEncoder. The default behaviour for this case will change such that the W3C recommended UTF-8 based encoding will be used at both ends instead of the respective default platform encodings. We don't expect there to be a significant number of applications, if at all any, that rely on this behaviour since: (a) it requires that both network peers be configured to use the same default encoding which is a questionable assumption to make on the web. (b) Most older user agents use the document encoding scheme, not the default platform encoding scheme; hence they have been unable to use the URLEncoder/URLDecoder classes till now for non-ASCII characters. (c) Both URLEncoder and URLDecoder API specs incorrectly state that only the least significant byte will be considered for the %xy escaping hence non-ASCII users might have been discouraged from using these API's in the past.
11-06-2004

WORK AROUND Name: clC74495 Date: 07/26/99 to use own implementation. ======================================================================
11-06-2004