JDK-7180858 : Regression: error in sax/trax encoding of supplementary characters
  • Type: Bug
  • Component: xml
  • Sub-Component: javax.xml.transform
  • Affected Version: 7u4
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2012-06-29
  • Updated: 2014-11-19
  • Resolved: 2012-07-23
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7
7u6 b15Fixed
Description
SHORT SUMMARY:
If XML output is generated using a TransformerHandler created by a
SAXTransformerFactory, Unicode supplementary characters are output
incorrectly.
Instead of a single numeric character entity for the supplementary character,
two entities are generated, representing the low and high surrogates.
This is a regression in 1.7.0_04 - there are no problems in earlier releases
of 1.7 or in the latest release of 1.6.
INDICATORS:
None
COUNTER INDICATORS:
None
TRIGGERS:
Here is a test to illustrate the problem:

import javax.xml.transform.sax.SAXTransformerFactory;
import javax.xml.transform.sax.TransformerHandler;
import javax.xml.transform.stream.StreamResult;

import org.xml.sax.helpers.AttributesImpl;

public class ucout {

  private static final char [] CHARS =
  String.valueOf(Character.toChars(0x12000)).toCharArray();

  public static void main(String[] args) throws Exception {
    TransformerHandler th = ((SAXTransformerFactory)
    SAXTransformerFactory.newInstance()).newTransformerHandler();
    
    th.setResult(new StreamResult(System.out));    
    th.startDocument();
    th.startElement("", "", "c",  new AttributesImpl());
    th.characters(CHARS, 0, CHARS.length);
    th.endElement("", "", "c");
    th.endDocument();
    System.out.println();
  }

If the test is run with 1.7.0_04 the (incorrect) output is:
<?xml version="1.0" encoding="UTF-8"?><c>��</c>

If run with 1.6 or 1.7.0_03, the (correct) output is:
<?xml version="1.0" encoding="UTF-8"?><c>𒀀</c>

 KNOWN WORKAROUND:
 No
 PRESENT SINCE:
 N/A
 HOW TO VERIFY:
 Run attached test case
 NOTES FOR SE:
 None
 REGRESSION:
 No

*** MNIEMIEC 06/29/12 12:47 pm *** (CHG: Tag Added)
*** MNIEMIEC 06/29/12 12:47 pm ***
New Tag: new_shadow

Comments
EVALUATION This issue has same root as 7151118. So it's fixed in b15. I've run the test against latest and b15, both returned expected results.
12-07-2012