Name: dfR10049 Date: 01/13/2003
The JCK tests for java.net/URL[Encoder/Decoder] fail if run after
JCK tests for java_io package in the same JVM, on Solaris 2.8
with LC_CTYPE set to "en_US.UTF-8".
The bug is: URLEncoder.encode method with "UTF-8" encoding
incorrectly processes surrogate pairs if an instance of InputStreamReader
is created and new URL(http_url).openConnection().connect() is called before.
So creating of InputStreamReader instance and connecting to the
http url affect on the output of the following call:
URLEncoder.encode("\uD800\uDC00 \uD801\uDC01 ", "UTF-8")
I wrote the minimal as possible test demonstrating the bug:
----------------- EncTest.java ------------------
import java.io.*;
import java.net.*;
public class EncTest {
public static void main(String args[]) {
try {
String toEncode = "\uD800\uDC00 \uD801\uDC01 ";
String enc1 = URLEncoder.encode(toEncode, "UTF-8");
byte bytes[] = {};
ByteArrayInputStream bais = new ByteArrayInputStream( bytes );
InputStreamReader reader = new InputStreamReader( bais, "8859_1" );
new URL(args[0]).openConnection().connect();
String enc2 = URLEncoder.encode(toEncode, "UTF-8");
if (enc1.equals(enc2)) {
System.out.println("Test passed: ");
} else {
System.out.println("Test failed: ");
}
System.out.println(" enc1: " + enc1);
System.out.println(" enc2: " + enc2);
} catch (Exception e) {
System.out.println(e);
}
}
}
-----------------------------------------
#> uname -a
SunOS matmech 5.8 Generic_108528-14 sun4u sparc SUNW,Ultra-5_10
#> echo $LC_CTYPE
en_US.UTF-8
#> java -version
java version "1.4.2-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-beta-b12)
Java HotSpot(TM) Client VM (build 1.4.2-beta-b12, mixed mode)
#> java EncTest <SOME AVAILABLE HTTP URL>
Test failed:
enc1: %F0%90%80%80+%F0%90%90%81+
enc2: ++
Note: the bug is reproducible with jdk1.4.2 b11 and b12.
======================================================================