United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-7180858 Regression: error in sax/trax encoding of supplementary characters
JDK-7180858 : Regression: error in sax/trax encoding of supplementary characters

Details
Type:
Bug
Submit Date:
2012-06-29
Status:
Closed
Updated Date:
2013-04-21
Project Name:
JDK
Resolved Date:
2012-07-23
Component:
xml
OS:
generic
Sub-Component:
javax.xml.transform
CPU:
generic
Priority:
P2
Resolution:
Fixed
Affected Versions:
7u4
Fixed Versions:
7u6 (b15)

Related Reports

Sub Tasks

Description
SHORT SUMMARY:
If XML output is generated using a TransformerHandler created by a
SAXTransformerFactory, Unicode supplementary characters are output
incorrectly.
Instead of a single numeric character entity for the supplementary character,
two entities are generated, representing the low and high surrogates.
This is a regression in 1.7.0_04 - there are no problems in earlier releases
of 1.7 or in the latest release of 1.6.
INDICATORS:
None
COUNTER INDICATORS:
None
TRIGGERS:
Here is a test to illustrate the problem:

import javax.xml.transform.sax.SAXTransformerFactory;
import javax.xml.transform.sax.TransformerHandler;
import javax.xml.transform.stream.StreamResult;

import org.xml.sax.helpers.AttributesImpl;

public class ucout {

  private static final char [] CHARS =
  String.valueOf(Character.toChars(0x12000)).toCharArray();

  public static void main(String[] args) throws Exception {
    TransformerHandler th = ((SAXTransformerFactory)
    SAXTransformerFactory.newInstance()).newTransformerHandler();
    
    th.setResult(new StreamResult(System.out));    
    th.startDocument();
    th.startElement("", "", "c",  new AttributesImpl());
    th.characters(CHARS, 0, CHARS.length);
    th.endElement("", "", "c");
    th.endDocument();
    System.out.println();
  }

If the test is run with 1.7.0_04 the (incorrect) output is:
<?xml version="1.0" encoding="UTF-8"?><c>��</c>

If run with 1.6 or 1.7.0_03, the (correct) output is:
<?xml version="1.0" encoding="UTF-8"?><c>𒀀</c>

 KNOWN WORKAROUND:
 No
 PRESENT SINCE:
 N/A
 HOW TO VERIFY:
 Run attached test case
 NOTES FOR SE:
 None
 REGRESSION:
 No

*** MNIEMIEC 06/29/12 12:47 pm *** (CHG: Tag Added)
*** MNIEMIEC 06/29/12 12:47 pm ***
New Tag: new_shadow

                                    

Comments
EVALUATION

This issue has same root as 7151118. So it's fixed in b15. I've run the test against latest and b15, both returned expected results.
                                     
2012-07-12



Hardware and Software, Engineered to Work Together