Bug ID: JDK-4239597 java.net.URLDecode does not handle double byte characters

Type: Bug
Component: core-libs
Sub-Component: java.net
Affected Version: 1.2.1

Priority: P4
Status: Closed
Resolution: Duplicate
OS: generic
CPU: generic

Submitted: 1999-05-19
Updated: 2000-04-24
Resolved: 2000-04-24


Name: skT88420			Date: 05/19/99


A URL is encoded using java.net.URLEncode which encodes the URL 
using the client OS's character encoding.  One the server side,
the java.net.URLDecode parses the URL and makes a unicode 
character out of each byte. If the client is using a Japanese OS 
with the SJIS encoding and types in the Japanese word setsume, 
the java.net.URLEncode converts URL word setsume to %90%D8%96%BE.  
On the server java.net.URLDecode parses the URL and makes a
four unicode character string out of the four bytes (not a two
character string).

There are two problems.
1) The URL is being encoded using the OS's default encoding but 
that encoding is not being sent to the server So, the server side
does not know how to decode the string properly.

2) The java.net.URLDecode class is decoding the string 
as if every byte is a character.  It is not handling double or
triple byte encoding (it must take into consideration the encoding 
of the URL
(Review ID: 83263) 
======================================================================

WORK AROUND Name: skT88420 Date: 05/19/99 I always encode the URLs into UTF8 and decode the URLs from UTF8. see java.io.DataInputStream.readUTF and java.io.DataInputStream.writeUTF ======================================================================

11-06-2004

EVALUATION mayank.upadhyay@eng 2000-02-23 Will be fixed to do UTF-8 encoding.

23-02-2000