JDK-6536111 : SAX parser throws OutOfMemoryError
  • Type: Bug
  • Component: xml
  • Sub-Component: org.xml.sax
  • Affected Version: 6
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: linux
  • CPU: x86
  • Submitted: 2007-03-19
  • Updated: 2012-04-25
  • Resolved: 2009-02-26
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other JDK 6 JDK 7
1.4.0 1.4Fixed 6u14Fixed 7Fixed
Related Reports
Relates :  
Description
FULL PRODUCT VERSION :
java version "1.6.0"
Java(TM) SE Runtime Environment (build 1.6.0-b105)
Java HotSpot(TM) Server VM (build 1.6.0-b105, mixed mode)


A DESCRIPTION OF THE PROBLEM :
When parsing huge XML files (> 200MB) with SAX Java 6 runs out of memory, because the whole input file is stored in memory. Java 1.5 and the current Xerces version 2.9.0 work fine.
I assume that there is a bug in XMLDocumentScannerImpl. It has a flag fReadingDTD indicating that currently the DTD is read. If this is true, refresh(int) adds character to a buffer. It seems the end of the DTD is not recognized and the whole XML file is added to the buffer.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the code which creates a large XML file in tmp (i.e. /var/tmp) location, and the OutOfMemoryError will show.

Parse it with the standard SAXParser using at least an EntityResolver that resolves the SystemId.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Should work without any OutOfMemory errors
ACTUAL -
OutOfMemory error

ERROR MESSAGES/STACK TRACES THAT OCCUR :
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at com.sun.org.apache.xerces.internal.util.XMLStringBuffer.append(XMLStringBuffer.java:205)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.refresh(XMLDocumentScannerImpl.java:1493)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.invokeListeners(XMLEntityScanner.java:2070)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanLiteral(XMLEntityScanner.java:1063)
        at com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:974)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanAttribute(XMLDocumentFragmentScannerImpl.java:1537)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1314)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2740)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:645)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:508)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
        at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107)
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
        at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
        at webbugstestcases.jaxp.sax.inc920008.SAXParserTest.main(SAXParserTest.java:71)
Java Result: 1


---------- BEGIN SOURCE ----------
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.io.StringReader;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;

public class SAXParserTest {
    private static final String DTD =
            "<!ELEMENT config  (config*,entry*)*>\n"
                    + "<!ATTLIST config key CDATA #REQUIRED>\n"
                    + "<!ELEMENT entry (#PCDATA)>\n"
                    + "<!ATTLIST entry key CDATA #REQUIRED type CDATA
#REQUIRED value CDATA #REQUIRED isnull CDATA #IMPLIED >";

    private static final EntityResolver RESOLVER = new EntityResolver() {
        public InputSource resolveEntity(String publicId, String systemId)
                throws SAXException, IOException {
            InputSource is = new InputSource(new StringReader(DTD));
            return is;
        }
    };

    public static void main(String[] args) throws
ParserConfigurationException,
            SAXException, FileNotFoundException, IOException {
        // create a huge XML file
        File test = File.createTempFile("test", "xml");
        test.deleteOnExit();
        BufferedWriter out = new BufferedWriter(new FileWriter(test));
        out.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n");
        out.write("<!DOCTYPE config SYSTEM
\"org/knime/core/node/config/XMLConfig.dtd\">\n");
        out.write("<config key=\"root\">\n");
        for (int i = 0; i < 1000000; i++) {
            out.write("<config key=\"" + i + "\">");
            out.write("<entry key=\"datacell\" type=\"xstring\"
value=\"org.knime.core.data.def.IntCell\"/>\n");
            out.write("</config>\n");
        }
        out.write("</config>");
        out.close();
       
        // try to parse it
        SAXParserFactory factory = SAXParserFactory.newInstance();
        factory.setValidating(true);
        SAXParser parser = factory.newSAXParser();

        XMLReader reader = parser.getXMLReader();
        reader.setEntityResolver(RESOLVER);

        // java.lang.OutOfMemoryError: Java heap space, even with 256MB heap
        reader.parse(new InputSource(new FileInputStream(test)));
    }
}
---------- END SOURCE ----------


REPRODUCIBILITY :
This bug can be reproduced always.

Comments
EVALUATION I appreciate all the concerns and votes for this issue. The fix is now integrated into the workspace for jdk6 update 14 which is scheduled to be released in the mid May timeframe. Meanwhile, you may use the endorsed mechanism to override the jaxp functionalities with jaxp jars downloadable from java.net.
26-02-2009

EVALUATION Fix is verified in JAXP 1.4 on java.net. Will request for an integration into a jdk6 update release.
15-07-2008

EVALUATION Fix is ready. Needs to get a review and regression test. We will then request an integration into a JDK6 update release as soon as possible.
02-07-2008

EVALUATION To answer all the requests to fix this issue, I'm raising the priority to 2. We should investigate it as soon as possible.
28-05-2008