JDK-8207760 : SAXException: Invalid UTF-16 surrogate detected: d83c ?
  • Type: Bug
  • Component: xml
  • Sub-Component: javax.xml.transform
  • Affected Version: 8,9,10,11
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: windows_10
  • CPU: x86_64
  • Submitted: 2018-07-16
  • Updated: 2020-02-21
  • Resolved: 2018-09-18
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 12 JDK 8 Other
11.0.4Fixed 12 b12Fixed 8u251Fixed openjdk8u222Fixed
Related Reports
Relates :  
Description
A DESCRIPTION OF THE PROBLEM :
When being processed, XML stream is split by chunks of 1024 bytes
If a multi-char symbol (e.g. emoji) is on the edge between two chunks then the first chunk is ended with the first char of the symbol and the second chunk is started with the second char of the symbol.
In the given example we have a "fallen leaf" Unicode symbol (https://www.compart.com/en/unicode/U+1F342). In the UTF-16 representation it consists of two chars - 0xD83C and 0xDF42. When the second char is carried to the next chunk the first char 0xD83C is recognized as a single invalid character


---------- BEGIN SOURCE ----------
https://github.com/dkBrazz/reproduce-jdk-xslt-bug
---------- END SOURCE ----------

FREQUENCY : always



Comments
May I have approval for the 11u backport?
27-03-2019

Here's the test patch. I'll post a review request on jdk8u-dev as well. 2c2 < * Copyright (c) 2018, 2019, Oracle and/or its affiliates. All rights reserved. --- > * Copyright (c) 2018, Oracle and/or its affiliates. All rights reserved. 23a24,25 > package transform; > 44c46,47 < * @run testng/othervm JDK8207760 --- > * @library /javax/xml/jaxp/libs /javax/xml/jaxp/unittest > * @run testng/othervm transform.JDK8207760 47a51 > @Listeners({jaxp.library.FilePolicy.class})
27-03-2019

The 8u patch should go through review, if there are changes. It is unclear from your comment what those changes are, with respect to the test.
27-03-2019

Fix Request 8u Low risk, patch applies cleanly net of file location and line numbers. Test requires minor patch to run under jdk8u. http://cr.openjdk.java.net/~phh/8207760/webrev.8u.jaxp.00/ http://cr.openjdk.java.net/~phh/8207760/webrev.8u.jdk.00/
27-03-2019

Fix Request 11u Very low risk, patch applies exactly.
26-03-2019

The issue also exists for CDATA. There is also a bug where an unicode character may be written outside of a CDATA section.
11-09-2018

I see the same exception from test case on recent JDK build on Mac
18-07-2018

To reproduce the issue, run the attached test case. JDK 8u171 - Fail JDK 11-ea+22 - Fail Output: ERROR: 'org.xml.sax.SAXException: java.io.IOException: Invalid UTF-16 surrogate detected: d83c ? java.io.IOException: Invalid UTF-16 surrogate detected: d83c ?' Exception in thread "main" javax.xml.transform.TransformerException: com.sun.org.apache.xalan.internal.xsltc.TransletException: org.xml.sax.SAXException: java.io.IOException: Invalid UTF-16 surrogate detected: d83c ? java.io.IOException: Invalid UTF-16 surrogate detected: d83c ? org.xml.sax.SAXException: java.io.IOException: Invalid UTF-16 surrogate detected: d83c ? java.io.IOException: Invalid UTF-16 surrogate detected: d83c ? at java.xml/com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:780) at java.xml/com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:370) at ReproduceXsltBug.main(ReproduceXsltBug.java:18) Caused by: com.sun.org.apache.xalan.internal.xsltc.TransletException: org.xml.sax.SAXException: java.io.IOException: Invalid UTF-16 surrogate detected: d83c ? java.io.IOException: Invalid UTF-16 surrogate detected: d83c ? org.xml.sax.SAXException: java.io.IOException: Invalid UTF-16 surrogate detected: d83c ? java.io.IOException: Invalid UTF-16 surrogate detected: d83c ? at java.xml/com.sun.org.apache.xalan.internal.xsltc.dom.SAXImpl.shallowCopy(SAXImpl.java:1739) at java.xml/com.sun.org.apache.xalan.internal.xsltc.dom.DOMAdapter.shallowCopy(DOMAdapter.java:310) at jdk.translet/die.verwandlung.simple.template$dot$0() at jdk.translet/die.verwandlung.simple.applyTemplates() at jdk.translet/die.verwandlung.simple.template$dot$0() at jdk.translet/die.verwandlung.simple.applyTemplates() at jdk.translet/die.verwandlung.simple.applyTemplates() at jdk.translet/die.verwandlung.simple.transform() at java.xml/com.sun.org.apache.xalan.internal.xsltc.runtime.AbstractTranslet.transform(AbstractTranslet.java:624) at java.xml/com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:776) ... 2 more Caused by: org.xml.sax.SAXException: java.io.IOException: Invalid UTF-16 surrogate detected: d83c ? java.io.IOException: Invalid UTF-16 surrogate detected: d83c ? at java.xml/com.sun.org.apache.xml.internal.serializer.ToStream.outputCharacters(ToStream.java:1549) at java.xml/com.sun.org.apache.xml.internal.serializer.ToStream.characters(ToStream.java:1430) at java.xml/com.sun.org.apache.xml.internal.serializer.ToUnknownStream.characters(ToUnknownStream.java:713) at java.xml/com.sun.org.apache.xml.internal.utils.FastStringBuffer.sendSAXcharacters(FastStringBuffer.java:1001) at java.xml/com.sun.org.apache.xml.internal.dtm.ref.sax2dtm.SAX2DTM2.copyTextNode(SAX2DTM2.java:3118) at java.xml/com.sun.org.apache.xalan.internal.xsltc.dom.SAXImpl.shallowCopy(SAXImpl.java:1715) ... 11 more Caused by: java.io.IOException: Invalid UTF-16 surrogate detected: d83c ? at java.xml/com.sun.org.apache.xml.internal.serializer.ToStream.accumDefaultEscape(ToStream.java:1744) at java.xml/com.sun.org.apache.xml.internal.serializer.ToStream.processDirty(ToStream.java:1666) at java.xml/com.sun.org.apache.xml.internal.serializer.ToStream.outputCharacters(ToStream.java:1530) ... 16 more --------- com.sun.org.apache.xalan.internal.xsltc.TransletException: org.xml.sax.SAXException: java.io.IOException: Invalid UTF-16 surrogate detected: d83c ? java.io.IOException: Invalid UTF-16 surrogate detected: d83c ? org.xml.sax.SAXException: java.io.IOException: Invalid UTF-16 surrogate detected: d83c ? java.io.IOException: Invalid UTF-16 surrogate detected: d83c ? at java.xml/com.sun.org.apache.xalan.internal.xsltc.dom.SAXImpl.shallowCopy(SAXImpl.java:1739) at java.xml/com.sun.org.apache.xalan.internal.xsltc.dom.DOMAdapter.shallowCopy(DOMAdapter.java:310) at jdk.translet/die.verwandlung.simple.template$dot$0() at jdk.translet/die.verwandlung.simple.applyTemplates() at jdk.translet/die.verwandlung.simple.template$dot$0() at jdk.translet/die.verwandlung.simple.applyTemplates() at jdk.translet/die.verwandlung.simple.applyTemplates() at jdk.translet/die.verwandlung.simple.transform() at java.xml/com.sun.org.apache.xalan.internal.xsltc.runtime.AbstractTranslet.transform(AbstractTranslet.java:624) at java.xml/com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:776) at java.xml/com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:370) at ReproduceXsltBug.main(ReproduceXsltBug.java:18) Caused by: org.xml.sax.SAXException: java.io.IOException: Invalid UTF-16 surrogate detected: d83c ? java.io.IOException: Invalid UTF-16 surrogate detected: d83c ? at java.xml/com.sun.org.apache.xml.internal.serializer.ToStream.outputCharacters(ToStream.java:1549) at java.xml/com.sun.org.apache.xml.internal.serializer.ToStream.characters(ToStream.java:1430) at java.xml/com.sun.org.apache.xml.internal.serializer.ToUnknownStream.characters(ToUnknownStream.java:713) at java.xml/com.sun.org.apache.xml.internal.utils.FastStringBuffer.sendSAXcharacters(FastStringBuffer.java:1001) at java.xml/com.sun.org.apache.xml.internal.dtm.ref.sax2dtm.SAX2DTM2.copyTextNode(SAX2DTM2.java:3118) at java.xml/com.sun.org.apache.xalan.internal.xsltc.dom.SAXImpl.shallowCopy(SAXImpl.java:1715) ... 11 more Caused by: java.io.IOException: Invalid UTF-16 surrogate detected: d83c ? at java.xml/com.sun.org.apache.xml.internal.serializer.ToStream.accumDefaultEscape(ToStream.java:1744) at java.xml/com.sun.org.apache.xml.internal.serializer.ToStream.processDirty(ToStream.java:1666) at java.xml/com.sun.org.apache.xml.internal.serializer.ToStream.outputCharacters(ToStream.java:1530) ... 16 more --------- org.xml.sax.SAXException: java.io.IOException: Invalid UTF-16 surrogate detected: d83c ? java.io.IOException: Invalid UTF-16 surrogate detected: d83c ? at java.xml/com.sun.org.apache.xml.internal.serializer.ToStream.outputCharacters(ToStream.java:1549) at java.xml/com.sun.org.apache.xml.internal.serializer.ToStream.characters(ToStream.java:1430) at java.xml/com.sun.org.apache.xml.internal.serializer.ToUnknownStream.characters(ToUnknownStream.java:713) at java.xml/com.sun.org.apache.xml.internal.utils.FastStringBuffer.sendSAXcharacters(FastStringBuffer.java:1001) at java.xml/com.sun.org.apache.xml.internal.dtm.ref.sax2dtm.SAX2DTM2.copyTextNode(SAX2DTM2.java:3118) at java.xml/com.sun.org.apache.xalan.internal.xsltc.dom.SAXImpl.shallowCopy(SAXImpl.java:1715) at java.xml/com.sun.org.apache.xalan.internal.xsltc.dom.DOMAdapter.shallowCopy(DOMAdapter.java:310) at jdk.translet/die.verwandlung.simple.template$dot$0() at jdk.translet/die.verwandlung.simple.applyTemplates() at jdk.translet/die.verwandlung.simple.template$dot$0() at jdk.translet/die.verwandlung.simple.applyTemplates() at jdk.translet/die.verwandlung.simple.applyTemplates() at jdk.translet/die.verwandlung.simple.transform() at java.xml/com.sun.org.apache.xalan.internal.xsltc.runtime.AbstractTranslet.transform(AbstractTranslet.java:624) at java.xml/com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:776) at java.xml/com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:370) at ReproduceXsltBug.main(ReproduceXsltBug.java:18) Caused by: java.io.IOException: Invalid UTF-16 surrogate detected: d83c ? at java.xml/com.sun.org.apache.xml.internal.serializer.ToStream.accumDefaultEscape(ToStream.java:1744) at java.xml/com.sun.org.apache.xml.internal.serializer.ToStream.processDirty(ToStream.java:1666) at java.xml/com.sun.org.apache.xml.internal.serializer.ToStream.outputCharacters(ToStream.java:1530) ... 16 more --------- java.io.IOException: Invalid UTF-16 surrogate detected: d83c ? at java.xml/com.sun.org.apache.xml.internal.serializer.ToStream.accumDefaultEscape(ToStream.java:1744) at java.xml/com.sun.org.apache.xml.internal.serializer.ToStream.processDirty(ToStream.java:1666) at java.xml/com.sun.org.apache.xml.internal.serializer.ToStream.outputCharacters(ToStream.java:1530) at java.xml/com.sun.org.apache.xml.internal.serializer.ToStream.characters(ToStream.java:1430) at java.xml/com.sun.org.apache.xml.internal.serializer.ToUnknownStream.characters(ToUnknownStream.java:713) at java.xml/com.sun.org.apache.xml.internal.utils.FastStringBuffer.sendSAXcharacters(FastStringBuffer.java:1001) at java.xml/com.sun.org.apache.xml.internal.dtm.ref.sax2dtm.SAX2DTM2.copyTextNode(SAX2DTM2.java:3118) at java.xml/com.sun.org.apache.xalan.internal.xsltc.dom.SAXImpl.shallowCopy(SAXImpl.java:1715) at java.xml/com.sun.org.apache.xalan.internal.xsltc.dom.DOMAdapter.shallowCopy(DOMAdapter.java:310) at jdk.translet/die.verwandlung.simple.template$dot$0() at jdk.translet/die.verwandlung.simple.applyTemplates() at jdk.translet/die.verwandlung.simple.template$dot$0() at jdk.translet/die.verwandlung.simple.applyTemplates() at jdk.translet/die.verwandlung.simple.applyTemplates() at jdk.translet/die.verwandlung.simple.transform() at java.xml/com.sun.org.apache.xalan.internal.xsltc.runtime.AbstractTranslet.transform(AbstractTranslet.java:624) at java.xml/com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:776) at java.xml/com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:370) at ReproduceXsltBug.main(ReproduceXsltBug.java:18)
18-07-2018