Bug ID: JDK-4488596 URLEncoder.encode() does not always take the low 8 bits of character

Type: Bug
Component: core-libs
Sub-Component: java.net
Affected Version: 1.2.2_08

Priority: P3
Status: Closed
Resolution: Duplicate
OS: solaris_2.6
CPU: sparc

Submitted: 2001-08-06
Updated: 2001-08-07
Resolved: 2001-08-07

javadocs say 'All other characters are converted into the 3-character string
%xy, where xy is the two-digit hexadecimal representation of the lower
8-bits of the character.'

Problem was seen trying to round trip Arabic text on a Sol8 system with a 
default LANG of 'ar' (8859-6) through UTF-8 to URLEncoded and then all the
way back. The URLEncode decode was mangling the encoded text due to the
interaction with the non-english default charset.

But the code is:

	OutputStreamWriter writer = new OutputStreamWriter(buf);

which uses the default encoding of the system property file.encoding to do
the encoding. If the char needs encoding, a sequence of bytes is written to
the outputstream rather than just the low 8 bits. The use of the writer is also unnecessary. See suggested fix for alternate code without the writer.

Will be filing the related bug in URLDecoder.decode() for related problem.

Seen in 1.2.2 and in 1.3.

SUGGESTED FIX encode(), no OutputStreamWriter needed. from Tom Mueller. public static String encode(String s) { int maxBytesPerChar = 10; StringBuffer out = new StringBuffer(s.length()); ByteArrayOutputStream buf = new ByteArrayOutputStream(maxBytesPerChar); for (int i = 0; i < s.length(); i++) { int c = (int)s.charAt(i); if (dontNeedEncoding.get(c)) { if (c == ' ') { c = '+'; } out.append((char)c); } else { int lowbyte = (c & 0xff); out.append('%'); char ch = Character.forDigit((lowbyte >> 4) & 0xF, 16); // converting to use uppercase letter as part of // the hex value if ch is a letter. if (Character.isLetter(ch)) { ch -= caseDiff; } out.append(ch); ch = Character.forDigit(lowbyte & 0xF, 16); if (Character.isLetter(ch)) { ch -= caseDiff; } out.append(ch); } } return out.toString(); }

11-06-2004

EVALUATION In JDK1.4 (J2SE 1.4), URLEncoder and URLDecoder APIs have been redesigned and the methods reimplemented. The problems mentioned by the description should go away. Please verify this. yingxian.wang@eng 2001-08-07 I have attached a test case that shows the problem, and the fix in 1.4. I'm going to close as a duplicate of 4257115 since that was the bug that fixed this issue. michael.mcmahon@ireland 2001-08-07

07-08-2001

Duplicate :	JDK-4257115 - URLEncoder and URLDecoder should support target character sets
Relates :	JDK-4488606 - URLDecoder.decode() should not do charset conversion