JDK-8012541 : Some XML 1.1 documents are not correctly handled by the DocumentBuilder API
  • Type: Bug
  • Component: xml
  • Sub-Component: javax.xml.parsers
  • Affected Version: 6u43,7u17
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • OS: windows_7
  • CPU: generic
  • Submitted: 2013-04-17
  • Updated: 2014-04-24
  • Resolved: 2014-04-24
Related Reports
Duplicate :  
Description
SYNOPSIS
--------
Some XML 1.1 documents are not correctly handled by the DocumentBuilder API

OPERATING SYSTEM
----------------
Windows 7 Professional x64

FULL JDK VERSION(S)
-------------------
Reproduced both on :

java version "1.6.0_43"
Java(TM) SE Runtime Environment (build 1.6.0_43-b01)
Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01, mixed mode)

and

java version "1.7.0_17"
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
Please note it also occurs on x86 VMs.

PROBLEM DESCRIPTION
-------------------
When parsing some XML documents that start with a XML 1.1 declaration using the DocumentBuilder API (javax.xml.parsers.DocumentBuilder), no exception is thrown but the resulting Document Object Model is corrupted : one or several nodes do not contain the right content.

In the attached example, we can see that nodes get corrupted.

REPRODUCTION INSTRUCTIONS
-------------------------
Run the attached DocumentBuilderCheck class. 2 examples are run successively, and an error message is printed to the console showing an error in each case.

In the first example, we generate an XML document into a file, with a simple structure (<?xml version="1.1" encoding="UTF-8"?><main_tag><test>0000</test><test>0001</test>[...]<test>2499</test></main_tag>) and then we parse it and analyze the resulting Document object : we try
and parse each "<test>" node into an integer. We then dump back the Document object to another XML file.

The second example is the same as first one, except that we keep the generated XML document to be parsed as a String without dumping it to a file.

Both examples show errors in the Document object. With JDK 1.7.0_17, the console output is :

example #1 - ERROR: content 't>24' found at index 1926 cannnot be recognized as a valid number [For input string: "t>24"]
example #2 - ERROR: content 't>14' found at index 964 cannnot be recognized as a valid number [For input string: "t>14"]
example #2 - ERROR: content 't>46' found at index 1446 cannnot be recognized as a valid number [For input string: "t>46"]

WORKAROUND
----------
Generate XML 1.0 documents when possible (but some locales require XML 1.1), or rely on a third party library like a recent Xerces implementation.

TESTCASE
--------
import java.io.File;
import java.io.PrintWriter;
import java.io.StringReader;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;

public class DocumentBuilderCheck {
    public static void main(String[] args) throws Exception {
        // Example 1
        // generating a simple XML document directly into a file
        String filename = "Example1_DocumentToParse.xml";
        generateXmlFile(filename, 2500);

        // parsing the document using DocumentBuilder
        Document doc2 = readXmlFile(filename);

        // analyzing the resulting document
        analyzeDocumentValidity("example #1", doc2);

        // dumping the parsed document to file
        String filename2 = "Example1_DocumentParsed.xml";
        writeDocument(doc2, filename2);      


        // Example 2
        // generating a simple XML document as a string
        String xmlDoc = generateXMLDocument(2500);

        // parsing the document using DocumentBuilder
        Document doc = readXmlDocument(xmlDoc);

        // analyzing the resulting document
        analyzeDocumentValidity("example #2", doc);

        // dumping the parsed document to file
        writeDocument(doc, "Example2_DocumentParsed.xml");      
    }

    private static void analyzeDocumentValidity(String testName, Document doc) {
        // analyzing the content of the parsed structure,
        // checking that it matches the original document
        NodeList nodes = doc.getDocumentElement().getChildNodes();
        for (int k=0;k<nodes.getLength();k++) {
            String nodeContent = nodes.item(k).getTextContent();

            // checking node content ("<test>" tag)
            try {
                // if parsing is incorrect, either we get an exception here (if content has been corrupted and )
                int nb = Integer.parseInt(nodeContent);
                if ( nb != k ) {
                    System.out.println(testName + " - ERROR : number at index "+k+" is not the expected one ("+nb+" instead of "+k+")");
                }
            } catch (NumberFormatException ex) {
                System.out.println(testName + " - ERROR: content '"+nodeContent+"' found at index "+k+" cannnot be recognized as a valid number ["+ex.getMessage()+"]");
            }
        }
    }

    private static void writeDocument(Document document, String filename) throws Exception {
        StreamResult streamResult = new StreamResult(filename);
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty(OutputKeys.METHOD, "xml");
        transformer.transform(new DOMSource(document), streamResult);
    }

    private static Document readXmlFile(String filename) throws Exception {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        dbf.setNamespaceAware(true);
        DocumentBuilder db = dbf.newDocumentBuilder();
        Document doc = db.parse(new File(filename));
        return doc;
    }

    private static Document readXmlDocument(String document) throws Exception {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        dbf.setNamespaceAware(true);
        DocumentBuilder db = dbf.newDocumentBuilder();
        InputSource is = new InputSource();
        is.setCharacterStream(new StringReader(document));
        Document doc = db.parse(is);
        return doc;
    }

    private static void generateXmlFile(String filename, int total)
    throws Exception {
        File f = new File(filename);

        PrintWriter pw = new PrintWriter(f);
        pw.write("<?xml version=\"1.1\" encoding=\"UTF-8\"?>");
        pw.write("<main_tag>");
        for (int i = 0; i < total; i++) {
            pw.write("<test>" + String.format("%04d", i) + "</test>");
        }
        pw.write("</main_tag>");
        pw.close();
    }

    private static String generateXMLDocument(int total){
        StringBuffer sb = new StringBuffer();
        sb.append("<?xml version=\"1.1\" encoding=\"UTF-8\"?>");
        sb.append("<main_tag>");
        for (int i = 0; i < total; i++) {
            sb.append("<test>" + String.format("%04d", i) + "</test>");
        }
        sb.append("</main_tag>");
        return sb.toString();
    }
}

Comments
Additional note. It was also verified that this bug shows up in JDK 7u65. Fixed in JDK 7u66.
24-04-2014

This has been verified as duplicate. The bug shows up in JDK 7 u51. The bug does not show up in JDK 7 u66, which has the fix from JDK-8027359.
24-04-2014

Priority changed from P4 to P3. Parsing XML 1.1 document resulted in corrupted document.
18-04-2013