Consider this small test program:
import java.io.*;
public class test {
/*
* Make sure 0xFEFF is encoded as this byte sequence: EF BB BF, when
* UTF-8 is being used, and parsed back into 0xFEFF.
*/
public static void main(String[] args) throws Exception {
/*
* Write
*/
FileOutputStream fos = new FileOutputStream("bom.txt");
OutputStreamWriter osw = new OutputStreamWriter(fos, "UTF8");
osw.write(0xFEFF);
osw.close();
/*
* Parse
*/
FileInputStream fis = new FileInputStream("bom.txt");
InputStreamReader isr = new InputStreamReader(fis, "UTF8");
char bomChar = (char) isr.read();
System.out.println("Parsed: "
+ Integer.toHexString(bomChar).toUpperCase());
if (bomChar != 0xFEFF) {
throw new Exception("Invalid BOM: "
+ Integer.toHexString(bomChar).toUpperCase());
}
isr.close();
}
}
On Linux JDK1.6 Beta b59d, the char is parsed correctly.
However, on Linux JDK1.6 rc b69, the program throws this exception:
Exception in thread "main" java.lang.Exception: Invalid BOM: FFFF
at test.main(test.java:28)
See 6378267 for more details.