SHORT SUMMARY: If XML output is generated using a TransformerHandler created by a SAXTransformerFactory, Unicode supplementary characters are output incorrectly. Instead of a single numeric character entity for the supplementary character, two entities are generated, representing the low and high surrogates. This is a regression in 1.7.0_04 - there are no problems in earlier releases of 1.7 or in the latest release of 1.6. INDICATORS: None COUNTER INDICATORS: None TRIGGERS: Here is a test to illustrate the problem: import javax.xml.transform.sax.SAXTransformerFactory; import javax.xml.transform.sax.TransformerHandler; import javax.xml.transform.stream.StreamResult; import org.xml.sax.helpers.AttributesImpl; public class ucout { private static final char [] CHARS = String.valueOf(Character.toChars(0x12000)).toCharArray(); public static void main(String[] args) throws Exception { TransformerHandler th = ((SAXTransformerFactory) SAXTransformerFactory.newInstance()).newTransformerHandler(); th.setResult(new StreamResult(System.out)); th.startDocument(); th.startElement("", "", "c", new AttributesImpl()); th.characters(CHARS, 0, CHARS.length); th.endElement("", "", "c"); th.endDocument(); System.out.println(); } If the test is run with 1.7.0_04 the (incorrect) output is: <?xml version="1.0" encoding="UTF-8"?><c>&#55304;&#56320;</c> If run with 1.6 or 1.7.0_03, the (correct) output is: <?xml version="1.0" encoding="UTF-8"?><c>&#73728;</c> KNOWN WORKAROUND: No PRESENT SINCE: N/A HOW TO VERIFY: Run attached test case NOTES FOR SE: None REGRESSION: No *** MNIEMIEC 06/29/12 12:47 pm *** (CHG: Tag Added) *** MNIEMIEC 06/29/12 12:47 pm *** New Tag: new_shadow
|