JDK-8136602 : Seemingly valid XML fails to get parsed with org.xml.sax.SAXParseException
  • Type: Bug
  • Component: xml
  • Sub-Component: javax.xml.parsers
  • Affected Version: 7,8,9
  • Priority: P4
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: x86_64
  • Submitted: 2015-09-14
  • Updated: 2016-02-02
  • Resolved: 2015-12-10
Related Reports
Duplicate :  
Relates :  
Description
FULL PRODUCT VERSION :
openjdk version "1.8.0_45-internal"
OpenJDK Runtime Environment (build 1.8.0_45-internal-b14)
OpenJDK 64-Bit Server VM (build 25.45-b02, mixed mode)


ADDITIONAL OS VERSION INFORMATION :
Linux rei2-wt 3.19.0-28-generic #30-Ubuntu SMP Mon Aug 31 15:52:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

A DESCRIPTION OF THE PROBLEM :
Parsing a simple XML fails. This program:

    public static void main(String[] args) throws Exception {
DocumentBuilderFactory.newInstance().newDocumentBuilder().parse("/home/vyzivus/Downloads/KD.xml");
    }

will fail to parse the attached XML with the following error message:
[Fatal Error] KD.xml:972:25: An invalid XML character (Unicode: 0xd840) was found in the comment.
Exception in thread "main" org.xml.sax.SAXParseException; systemId: file:///home/vyzivus/Downloads/KD.xml; lineNumber: 972; columnNumber: 25; An invalid XML character (Unicode: 0xd840) was found in the comment.
	at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
	at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:348)
	at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:177)
	at com.company.Main.main(Main.java:8)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)


When the comment line in question is removed, the parser will succeed in parsing of the XML. Apparently, the comment parser will incorrectly parse the unicode character and will even report incorrect codepoint (0xd840 instead of 2000B).

You can download the XML in question here: http://www.baka.sk/KD.xml


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Download the KD.xml file from http://www.baka.sk/KD.xml
2. Parse the attached XML: DocumentBuilderFactory.newInstance().newDocumentBuilder().parse("KD.xml");

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The parse succeeds and throws no exception
ACTUAL -
An exception is thrown: [Fatal Error] KD.xml:972:25: An invalid XML character (Unicode: 0xd840) was found in the comment.
Exception in thread "main" org.xml.sax.SAXParseException; systemId: file:///home/vyzivus/Downloads/KD.xml; lineNumber: 972; columnNumber: 25; An invalid XML character (Unicode: 0xd840) was found in the comment.
	at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
	at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:348)
	at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:177)
	at com.company.Main.main(Main.java:8)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
package com.company;

import javax.xml.parsers.DocumentBuilderFactory;

public class Main {

    public static void main(String[] args) throws Exception {
        DocumentBuilderFactory.newInstance().newDocumentBuilder().parse("/home/vyzivus/Downloads/KD.xml");
    }
}

---------- END SOURCE ----------


Comments
JDK-8072081 is now fixed.
10-12-2015

**In the attached Test case the path to the xml file should be updated before running it.** This could be reproduced in JDK 7, 7u80, 8, 8u60, 9. This is duplicate of JDK-8072081 which is still in Unresolved state. Moving across to dev-team.
16-09-2015