FULL PRODUCT VERSION :
1.7.0_65
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows [Version 6.1.7601]
A DESCRIPTION OF THE PROBLEM :
I have narrowed down a problem where our application produced XML which it could not parse back. The XML contained "character references", but the reference had an invalid value (there are valid ranges fro them in XML). It turned out that these character references are generated specifically for characters outside the BMP, i.e. are encoded using a surrogate pair. Further investigation revealed that this happens only when constructing the XMLStreamWriter with an OutputStreamWriter. The surrogates are encoded as valid UTF-8 multibytes sequences when usign a plain OutputStream. The error can however not be in the OutputStreamWriter, since the character references are specific to XML files of which the OutputStreamWriter knows nothing.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
I am attaching a test program which clearly demonstrates the problem.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
package com.dramaqueen.exporters;
import static org.junit.Assert.*;
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import java.io.OutputStreamWriter;
import java.io.UnsupportedEncodingException;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;
import javax.xml.stream.XMLStreamWriter;
import org.junit.Test;
import com.sun.xml.internal.messaging.saaj.util.ByteOutputStream;
@SuppressWarnings("nls")
public class StreamVersusWriterTest {
@Test
public void streamVersusWriter() {
String charset = "UTF-8";
ByteOutputStream streamA = new ByteOutputStream();
ByteOutputStream streamB = new ByteOutputStream();
XMLOutputFactory factory = XMLOutputFactory.newInstance();
try {
XMLStreamWriter writerA = factory.createXMLStreamWriter(streamA,
charset);
generateXML(writerA, charset);
OutputStreamWriter streamWriter = new OutputStreamWriter(streamB,
charset);
XMLStreamWriter writerB = factory.createXMLStreamWriter(
streamWriter);
generateXML(writerB, charset);
String outputA = streamA.toString();
String outputB = streamB.toString();
System.out.println("output using OutputStream : " + outputA);
System.out.println("output using OutputStreamWriter: " + outputB);
// assertEquals(outputA, outputB);
readXML(outputA.getBytes(charset), charset);
readXML(outputB.getBytes(charset), charset);
} catch (XMLStreamException e) {
e.printStackTrace();
// assertTrue(false);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
// assertTrue(false);
}
}
private void generateXML(XMLStreamWriter writer, String charset)
throws XMLStreamException {
// Char sequence containing a smiley which is encoded as a surrogate
// pair in the Java string
String sequence = "A😊�Bß";
writer.writeStartDocument(charset, "1.0");
writer.writeStartElement("a");
writer.writeCharacters(sequence);
writer.writeEndElement();
writer.writeEndDocument();
writer.flush();
}
private void readXML(byte[] xmlData, String charset)
throws XMLStreamException {
InputStream stream = new ByteArrayInputStream(xmlData);
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader xmlReader
= factory.createXMLStreamReader(stream, charset);
while (xmlReader.hasNext())
xmlReader.next();
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Use OutputStream, not OutpuStreamWriter