FULL PRODUCT VERSION : java version "1.6.0-beta2" Java(TM) SE Runtime Environment (build 1.6.0-beta2-b86) Java HotSpot(TM) Client VM (build 1.6.0-beta2-b86, mixed mode, sharing) ADDITIONAL OS VERSION INFORMATION : Windows XP Professional SP 2 A DESCRIPTION OF THE PROBLEM : This bug is responsible for the following behavior: Some UTF-16 characters can't be put into a JDOM after they have been encoded using the CharsetEncoder. The returning ByteBuffer contains a null byte at the end. This zero byte seems to be responsible for the error while building the DOM. Also there is a difference in version 1.5.0_07 compared to version 1.6.0 (b86). The character which causes this behaviour is different: "u\0237" - version 1.5.0_07 OK, version 1.6.0 NOK "u\304E" - version 1.5.0_07 NOK, version 1.6.0 OK STEPS TO FOLLOW TO REPRODUCE THE PROBLEM : Run the class CharsetEncoderTest twice, one time with java 1.5.0_07 and the second time with Java 1.6.0 b86... EXPECTED VERSUS ACTUAL BEHAVIOR : EXPECTED - CharsetEncoder should encode the two Unicode (UTF-16) characters into UTF-8 Characters, which then could be used as the Text of an XML DOM entry. ACTUAL - XML-DOM should accept the encoded String generated out of the ByteBuffer which returned from the CharsetEncoder. The ByteBuffer contained a additional "empty" byte with the value = 0. (This behavior occurs in both java versions mentioned, but with different characters... ERROR MESSAGES/STACK TRACES THAT OCCUR : Exception in thread "main" org.jdom.IllegalDataException: The data "AA " is not legal for a JDOM attribute: 0x0 is not a legal XML character. at org.jdom.Attribute.setValue(Attribute.java:486) at org.jdom.Attribute.<init>(Attribute.java:229) at org.jdom.Attribute.<init>(Attribute.java:252) at org.jdom.Element.setAttribute(Element.java:1109) at test.CharsetEncoderTest.testEncodeSaveXML(CharsetEncoderTest.java:39) at test.CharsetEncoderTest.main(CharsetEncoderTest.java:20) !!! NOTE !!!: The space in the String "AA " was not a space in the original Error Message. It was an undisplayable Character. REPRODUCIBILITY : This bug can be reproduced always. ---------- BEGIN SOURCE ---------- import java.io.UnsupportedEncodingException; import java.nio.ByteBuffer; import java.nio.CharBuffer; import java.nio.charset.CharacterCodingException; import java.nio.charset.Charset; import java.nio.charset.CharsetEncoder; import org.jdom.Document; import org.jdom.Element; public class CharsetEncoderTest { private static int encodee160 = 0x304E; // Works only with version 1.6.0 private static int encodee150_07 = 0x237; // Works only with version 1.5.0_07 private static String encoded; public static void main(String[] args) { testEncodeSaveXML(encodee150_07); testEncodeSaveXML(encodee160); } public static void testEncodeSaveXML(int character) { Charset set = Charset.forName("UTF-8"); CharsetEncoder encoder = set.newEncoder(); CharBuffer chb = CharBuffer.allocate(1); chb.put((char) character); chb.rewind(); encoder.reset(); try { ByteBuffer bb; bb = encoder.encode(chb); byte[] ba = bb.array(); encoded = new String(ba, "ISO-8859-1"); Document doc = new Document(); Element e = new Element("XMLChar"); e.setAttribute("value", encoded); doc.setRootElement(e); } catch (CharacterCodingException e) { e.printStackTrace(); } catch (UnsupportedEncodingException e) { e.printStackTrace(); } } } ---------- END SOURCE ---------- CUSTOMER SUBMITTED WORKAROUND : Removing the last (wrong) character from the encoded String before processing if encoding resulted in a null byte...
|