The Version table provides details related to the release that this issue/RFE will be addressed.
Unresolved : Release in which this issue/RFE will be addressed. Resolved: Release in which this issue/RFE has been resolved. Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availabitlity Release.
The java.net.URLEncoder and java.net.URLDecoder classes currently
always encode and decode non-ascii characters using the platform's
default character set encoding.
These two classes should be extended to allow encoding and decoding to
and from specific character set encoding formats. The default character
set should be UTF8.
BugTraq+ Release Management Values
COMMIT TO FIX:
The customer's suggestion sounds reasonable, especially plan 2. But I don't know whether it can make into Kestrel, because it requires some new APIs.
Plan (1) should be implemented. The only case in which has a potential for a backwards compatibility issue is when using non-ASCII characters (> 1 byte) in the form data on a platform with a non-ASCII default encoding. That case should not be working too well under the current implementation and as this particular customer indicated these classes are useless to them. Moreover, the current API specification clearly (albeit incorrectly) says that only 8 bytes are used in the encoding.
An analysis of the backwards compatibility with regards to the
behaviour of the current URLEncoder/URLDecoder API is as follows:
(i) On platforms that use ASCII as the default encoding:
- URI's with ASCII characters will work just fine since
the behaviour for those characters will be unchanged under
- URI's with non-ASCII characters (> 1byte) do not work at
present so there is no backwards compatibility issue.
(ii) On platforms that use a non-ASCII encoding as the default, eg
Unicode or some two byte japanese encoding:
- URI's with ASCII characters in them are correctly encoded by
URLEncoder but *incorrectly* decoded by URLDecoder at
present. The correct encoding of ASCII characters will remain
unchanged while the decoding will start working.
- URI's with non-ASCII characters are encoded by
URLEncoder with a non-standard method where the default
platform encoding is used to obtain the bytes for the
%xy escaping. Currently, the URLDecoder class decodes
an encoded string into the original string IFF the default
encoding on the peer side is the same as that used by
URLEncoder. The default behaviour for this case will change
such that the W3C recommended UTF-8 based encoding will be
used at both ends instead of the respective default platform
We don't expect there to be a significant number of
applications, if at all any, that rely on this behaviour
(a) it requires that both network peers be configured to use
the same default encoding which is a questionable assumption
to make on the web.
(b) Most older user agents use the document encoding
scheme, not the default platform encoding scheme; hence they
have been unable to use the URLEncoder/URLDecoder classes
till now for non-ASCII characters.
(c) Both URLEncoder and URLDecoder API specs incorrectly
state that only the least significant byte will be considered
for the %xy escaping hence non-ASCII users might have been
discouraged from using these API's in the past.
Name: clC74495 Date: 07/26/99
to use own implementation.