United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6536111 SAX parser throws OutOfMemoryError
JDK-6536111 : SAX parser throws OutOfMemoryError

Details
Type:
Bug
Submit Date:
2007-03-19
Status:
Closed
Updated Date:
2012-04-25
Project Name:
JDK
Resolved Date:
2009-02-26
Component:
xml
OS:
linux
Sub-Component:
org.xml.sax
CPU:
x86
Priority:
P2
Resolution:
Fixed
Affected Versions:
6
Fixed Versions:
1.4.0 (1.4)

Related Reports
Backport:
Backport:
Relates:

Sub Tasks

Description
FULL PRODUCT VERSION :
java version "1.6.0"
Java(TM) SE Runtime Environment (build 1.6.0-b105)
Java HotSpot(TM) Server VM (build 1.6.0-b105, mixed mode)


A DESCRIPTION OF THE PROBLEM :
When parsing huge XML files (> 200MB) with SAX Java 6 runs out of memory, because the whole input file is stored in memory. Java 1.5 and the current Xerces version 2.9.0 work fine.
I assume that there is a bug in XMLDocumentScannerImpl. It has a flag fReadingDTD indicating that currently the DTD is read. If this is true, refresh(int) adds character to a buffer. It seems the end of the DTD is not recognized and the whole XML file is added to the buffer.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the code which creates a large XML file in tmp (i.e. /var/tmp) location, and the OutOfMemoryError will show.

Parse it with the standard SAXParser using at least an EntityResolver that resolves the SystemId.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Should work without any OutOfMemory errors
ACTUAL -
OutOfMemory error

ERROR MESSAGES/STACK TRACES THAT OCCUR :
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at com.sun.org.apache.xerces.internal.util.XMLStringBuffer.append(XMLStringBuffer.java:205)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.refresh(XMLDocumentScannerImpl.java:1493)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.invokeListeners(XMLEntityScanner.java:2070)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanLiteral(XMLEntityScanner.java:1063)
        at com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:974)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanAttribute(XMLDocumentFragmentScannerImpl.java:1537)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1314)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2740)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:645)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:508)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
        at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107)
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
        at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
        at webbugstestcases.jaxp.sax.inc920008.SAXParserTest.main(SAXParserTest.java:71)
Java Result: 1


---------- BEGIN SOURCE ----------
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.io.StringReader;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;

public class SAXParserTest {
    private static final String DTD =
            "<!ELEMENT config  (config*,entry*)*>\n"
                    + "<!ATTLIST config key CDATA #REQUIRED>\n"
                    + "<!ELEMENT entry (#PCDATA)>\n"
                    + "<!ATTLIST entry key CDATA #REQUIRED type CDATA
#REQUIRED value CDATA #REQUIRED isnull CDATA #IMPLIED >";

    private static final EntityResolver RESOLVER = new EntityResolver() {
        public InputSource resolveEntity(String publicId, String systemId)
                throws SAXException, IOException {
            InputSource is = new InputSource(new StringReader(DTD));
            return is;
        }
    };

    public static void main(String[] args) throws
ParserConfigurationException,
            SAXException, FileNotFoundException, IOException {
        // create a huge XML file
        File test = File.createTempFile("test", "xml");
        test.deleteOnExit();
        BufferedWriter out = new BufferedWriter(new FileWriter(test));
        out.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n");
        out.write("<!DOCTYPE config SYSTEM
\"org/knime/core/node/config/XMLConfig.dtd\">\n");
        out.write("<config key=\"root\">\n");
        for (int i = 0; i < 1000000; i++) {
            out.write("<config key=\"" + i + "\">");
            out.write("<entry key=\"datacell\" type=\"xstring\"
value=\"org.knime.core.data.def.IntCell\"/>\n");
            out.write("</config>\n");
        }
        out.write("</config>");
        out.close();
       
        // try to parse it
        SAXParserFactory factory = SAXParserFactory.newInstance();
        factory.setValidating(true);
        SAXParser parser = factory.newSAXParser();

        XMLReader reader = parser.getXMLReader();
        reader.setEntityResolver(RESOLVER);

        // java.lang.OutOfMemoryError: Java heap space, even with 256MB heap
        reader.parse(new InputSource(new FileInputStream(test)));
    }
}
---------- END SOURCE ----------


REPRODUCIBILITY :
This bug can be reproduced always.

                                    

Comments
EVALUATION

To answer all the requests to fix this issue, I'm raising the priority to 2. We should investigate it as soon as possible.
                                     
2008-05-28
EVALUATION

Fix is ready. Needs to get a review and regression test. We will then request an integration into a JDK6 update release as soon as possible.
                                     
2008-07-02
EVALUATION

Fix is verified in JAXP 1.4 on java.net. Will request for an integration into a jdk6 update release.
                                     
2008-07-15
EVALUATION

I appreciate all the concerns and votes for this issue. The fix is now integrated into the workspace for jdk6 update 14 which is scheduled to be released in the mid May timeframe. 

Meanwhile, you may use the endorsed mechanism to override the jaxp functionalities with jaxp jars downloadable from java.net.
                                     
2009-02-26



Hardware and Software, Engineered to Work Together