JDK-4843787 : org.xml.sax.SAXException was thrown when parsing large file
  • Type: Bug
  • Component: xml
  • Sub-Component: org.xml.sax
  • Affected Version: 1.4.2
  • Priority: P4
  • Status: Closed
  • Resolution: Not an Issue
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2003-04-07
  • Updated: 2012-04-25
  • Resolved: 2003-04-16
Description
Name: gm110360			Date: 04/07/2003


FULL PRODUCT VERSION :
java version "1.4.2-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-beta-b19)
Java HotSpot(TM) Client VM (build 1.4.2-beta-b19, mixed mode)

FULL OS VERSION :
Microsoft Windows XP [Version 5.1.2600]

(Note: also shown error on win98 2nd edition)

A DESCRIPTION OF THE PROBLEM :

Parsing a large file with many entities using SAX or DOM, an exception will be thrown: org.xml.sax.SAXException: Fatal Error: URI=null Line=595: Parser has reached the entity expansion limit "64,000" set by the Application.


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :

Run the source.   Please email me for example test file. (testfile.xml)

In case you don't want to email me for the file, here is how to create one:

1) create an testfile.xml in the same directory where you run the code
2) Paste the following:

<?xml version='1.0' encoding='utf-8'?>
<!--DTD for vocab -->
<!DOCTYPE FirstNode [
ELEMENT FirstNode (ChildNode)*
ELEMENT ChildNode (#PCDATA)
]>

<FirstNode>
<ChildNode>
<html><body><a name="1"></a>
<p><b>concinnity</b></p>
<blockquote>concinnity was Word of the Day on <a href="http://www.dictionary.com/wordoftheday/archive/2001/08/18.html">August 18, 2001</a>.</blockquote><br>
<table border="0" cellpadding="0" cellspacing="0" width="100%"><tr><td class="src"><a href="/search?q=00-database-info&amp;db=wotd" title="Click for more information about this dictionary">Source</a>: <cite>Dictionary.com Word of the Day</cite></td></tr></table>
<a name="2"></a>

<TABLE><TR><TD><A NAME="C0548200"><B>con&#183;cin&#183;ni&#183;ty</B></A> &nbsp;&nbsp;<A TITLE="Click for guide to symbols." onClick="ahdpop();return false;" HREF="/help/ahd4/pronkey.html" CLASS="linksrc"><b>Pronunciation Key</b></A>&nbsp;&nbsp;(k<IMG ALT="" SRC="pronkey_files/schwa.gif" height="15" width="6" ALIGN="ABSBOTTOM">n-s<IMG
ALT="" SRC="pronkey_files/ibreve.gif" height="15" width="7" ALIGN="ABSBOTTOM">n<IMG ALT="" SRC="pronkey_files/prime.gif" height="22" width="4" ALIGN="ABSBOTTOM"><IMG ALT="" SRC="pronkey_files/ibreve.gif" height="15" width="7" ALIGN^F
quot; SRC="pronkey_files/emacr.gif" height="15" width="7" ALIGN="ABSBOTTOM">)<BR>
 <I>n.</I> <I>pl.</I> <B>con&#183;cin&#183;ni&#183;ties </B><OL><LI> Harmony in the arrangement or interarrangement of parts with respect to a whole.</LI>
<LI> Studied elegance and facility in style of expression: &#147;He has what one character calls &#145;the gifts of concinnity and concision,&#146; that deft swipe with a phrase that can be so
devastating in children&#148; (Elizabeth Ward).
</LI>
<LI>An instance of harmonious arrangement or studied elegance and facility.</LI>
</OL><BR>
<HR ALIGN="left" WIDTH="25%">[From Latin<TT> concinnit<IMG ALT="" SRC="pronkey_files/amacr.gif" height="15" width="7" ALIGN="ABSBOTTOM">s</TT>, from<TT> concinn<IMG ALT="" SRC="pronkey_files/amacr.gif" height="15" width="7" ALIGN="ABSBOTTOM">re</TT>, <I>to put in order</I>,
from<TT> concinnus</TT>, <I>deftly joined</I>.]</TD>
</TR></TABLE>
<a name="3"></a>
<b>concinnity</b><br><br>
 \Con*cin"ni*ty\, n. [L. concinnitas, fr. concinnus
   skillfully put together, beautiful. Of uncertain origin.]
   Internal harmony or fitness; mutual adaptation of parts;
   elegance; -- used chiefly of style of discourse. [R.]
<br><br>
         An exact concinnit
;<table border="0" cellpadding="0" cellspacing="0" width="100%"><tr><td class="src"><a href="/search?q=00-database-info&amp;db=web1913" title="Click for more information about this dictionary">Source</a>: <cite>Webster's Revised Unabridged Dictionary, &copy; 1996, 1998 MICRA, Inc.</cite></td></tr></table>
</body></html>
</ChildNode>

</FirstNode>

3) Repeatedly copy and paste the <ChildNode>...</ChildNode> content for about 196 times inside the <FirstNode>..</FirstNode>

When you run, the error happens after reading about 195 ChildNode.

You can change line 30 and 31 of source:
        test.DOMRead();
        //test.SAXRead();
to:
        //test.DOMRead();
        test.SAXRead();

to test SAX error. In both cases, an exception was generated.


EXPECTED VERSUS ACTUAL BEHAVIOR :
No error.
Exception when run

ERROR MESSAGES/STACK TRACES THAT OCCUR :
org.xml.sax.SAXException: Fatal Error: URI=null Line=595: Parser has reached the entity expansion limit "64,000" set by the Application.
        at TErrorHandler.fatalError(XMLError.java:198)
        at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3342)
        at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3333)
        at org.apache.crimson.parser.Parser2.expandEntityInContent(Parser2.java:2667)
        at org.apache.crimson.parser.Parser2.maybeReferenceInContent(Parser2.java:2569)
        at org.apache.crimson.parser.Parser2.content(Parser2.java:1980)
        at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1654)
        at org.apache.crimson.parser.Parser2.content(Parser2.java:1926)
        at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1654)
        at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:634)
        at org.apache.crimson.parser.Parser2.parse(Parser2.java:333)
        at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:448)
        at org.apache.crimson.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:185)
        at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:76)
        at XMLError.DOMRead(XMLError.java:101)
        at XMLError.main(XMLError.java:30)


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.util.*;
import org.w3c.dom.*;
import java.io.*;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.FactoryConfigurationError;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.*;

import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import org.w3c.dom.*;
import org.w3c.dom.Document;
import org.w3c.dom.DOMException;


public class XMLError {

    private String fname = null;

    public XMLError(String fname) {
        this.fname = fname;
    }
    
    public static void main(String [] argv){
        XMLError test = new XMLError("testfile.xml");
        test.DOMRead();
        //test.SAXRead();
    }

	public void SAXRead(){
        	System.out.println("Reading " + fname + "...");
        	String data = readFile(fname);
	        if(data == null){
            		System.out.println("There is no such file as " + fname);
            		return;
        	}
		try{
                    SAXParserFactory factory = SAXParserFactory.newInstance();
                    factory.setValidating(true);
                    SAXParser parser = factory.newSAXParser();
                    //org.xml.sax.helpers.DefaultHandler
                    
                    parser.parse(new ByteArrayInputStream(data.getBytes()), new DefaultHandler(){
                        private CharArrayWriter contents = new CharArrayWriter();
                        private int count;
                        
                        public void characters(char[] ch, int start, int length){
                            contents.write( ch, start, length );
                        }
                        public void endDocument(){
                            System.out.println("Finish:  " + count);
                        }
                        public void endElement(String uri, String localName, String qName) {
                            if ( qName.equals( "ChildNode" ) ) {
                                count++;
                                String str = contents.toString();
                                System.out.println("Importing... " + count + " : " + str);
                            }
                        }
                        public void startDocument(){
                            //contents.reset();
                            count = 0;
                            
                        }
                        public void startElement(String uri, String localName, String qName, Attributes attributes){
                            contents.reset();
                            //System.out.println("The name: " + localName + ", qName: " + qName);
                        }
                        
                    });
                }catch(Exception ee){
                    ee.printStackTrace();
                }
	}
    
    public void DOMRead(){
        System.out.println("Reading " + fname + "...");
        String data = readFile(fname);
        if(data == null){
            System.out.println("There is no such file as " + fname);
            return;
        }
        int count = 0;
        try {
            TErrorHandler error = new TErrorHandler();
            DocumentBuilderFactory factory =
            DocumentBuilderFactory.newInstance();
            factory.setValidating(true);
            factory.setIgnoringElementContentWhitespace(true);

            //factory.setNamespaceAware(true);
            //factory.setExpandEntityReferences(false);

            System.out.println("Parsing xml data...");
            DocumentBuilder builder = factory.newDocumentBuilder();
            builder.setErrorHandler(error);
            Document document = builder.parse(new ByteArrayInputStream(data.getBytes()));
            Node node;
            node = document.getFirstChild();
            if(node == null){
                return;
            }
            System.out.println("Start importing data: ");
            while(node != null){
                if(node.getNodeType() == Node.ELEMENT_NODE){
                    if("FirstNode".equalsIgnoreCase(node.getNodeName())) break;
                }
                node = node.getNextSibling();
            }
            node = node.getFirstChild();
            String str = null;
           
            boolean done = false;
            while((node != null) && (!done)){
                str = getValue(node);
                if(str == null) break;
                node = node.getNextSibling();
                count++;
                if((count % 10) == 0){
                    System.out.print(".");
                }
            }
        }catch(Exception e){
            e.printStackTrace();
        }
        
        System.out.println("\n\nDone: " + count);
    }
    static public String getValue(Node node){
        if(node == null) return null;
        Node node2 = node.getFirstChild();
        if(node2 == null){
            return "";
        }
        if(node2.getNodeType() != Node.TEXT_NODE) return null;
        return node2.getNodeValue();
    }

    public static String readFile(String fname){
        if((fname == null) || (fname.trim().length() <= 0)){
            return null;
        }
        BufferedReader in = null;
        String str;
        StringBuffer buf = new StringBuffer();
        try{
            in = new BufferedReader(new FileReader(fname));
            while(in.ready()){
                str = in.readLine();
                if(str == null) break;
                buf.append(str + "\n");
            }
            in.close();
        }catch(IOException e){
            //e.printStackTrace();
            return null;
        }
        return buf.toString();
    }
}

class TErrorHandler implements ErrorHandler {
    int errNo = 0;
    String errMessage = "";
    public void resetError(){
        errNo = 0;
        errMessage = "";
    }
    public void setError(String mesg){
        errNo = 1;
        if(mesg == null) return;
        errMessage = errMessage + "\n" + mesg;
    }
    TErrorHandler() {
    }
    private String getParseExceptionInfo(SAXParseException spe) {
        String systemId = spe.getSystemId();
        if (systemId == null) {
            systemId = "null";
        }
        String info = "URI=" + systemId +  " Line=" + spe.getLineNumber() +
        ": " + spe.getMessage();
        return info;
    }
    public void warning(org.xml.sax.SAXParseException sAXParseException) throws org.xml.sax.SAXException {
        setError("Warning: " + getParseExceptionInfo(sAXParseException));
    }
    public void error(org.xml.sax.SAXParseException sAXParseException) throws org.xml.sax.SAXException {
        String message = "Error: " + getParseExceptionInfo(sAXParseException);
        throw new SAXException(message);
    }
    public void fatalError(org.xml.sax.SAXParseException sAXParseException) throws org.xml.sax.SAXException {
        String message = "Fatal Error: " + getParseExceptionInfo(sAXParseException);
        throw new SAXException(message);
    }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :

None
(Review ID: 183616) 
======================================================================
###@###.### 2004-07-13

Comments
WORK AROUND Set System property "entityExpansionLimit" and can also set the "entityExpansionLimit" property in JRE_HOME/lib/jaxp.properties
29-08-2004

PUBLIC COMMENTS This is not a bug, but a feature added to avoid Denial of Service attack. User can set System property "entityExpansionLimit" to give different value than the default 64000. User can also add this property to <JRE_HOME>/lib/jaxp.properties ( for example /home/x/jdk-1_4_2/jre/lib/jaxp.properties ) ###@###.### 2003-04-16 Hello, To give you more information what Ramesh has already added. This check was added to make Java platform more secure. 64000 is considered to be pretty large number for any real life application to have that much entity expansions in single XML document. However, if any application does need to have higher limit it can always do by setting SYSTEM PROPERTY 'entityExpansionLimit'. This SYSTEM PROPERTY can be used as follows.. java -DentityExpansionLimit=100000 <command> You can also add it in jaxp.properties file. You can even set the limit to number less than 64000 if you think this limit is too large and can affect the performance of your application. It seems Release Notes are not giving the right details which should be fixed. I will work with documentation team to get it fixed. ###@###.### 2004-07-13
13-07-2004

EVALUATION This is not a bug, but a feature which we introduced to avoid the denial of service attack. Now user can set "entityExpansionLimit" system property if they want to change the default limit which is set as 64000. User can also add this property to <JRE_HOME>/lib/jaxp.properties ( for example /home/x/jdk-1_4_2/jre/lib/jaxp.properties ) along with other Factory information We need to inform the user who filed this bug. And we also need to make this more visible in the documentation. ###@###.### 2003-04-16
16-04-2003