Name: gm110360 Date: 04/07/2003 FULL PRODUCT VERSION : java version "1.4.2-beta" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-beta-b19) Java HotSpot(TM) Client VM (build 1.4.2-beta-b19, mixed mode) FULL OS VERSION : Microsoft Windows XP [Version 5.1.2600] (Note: also shown error on win98 2nd edition) A DESCRIPTION OF THE PROBLEM : Parsing a large file with many entities using SAX or DOM, an exception will be thrown: org.xml.sax.SAXException: Fatal Error: URI=null Line=595: Parser has reached the entity expansion limit "64,000" set by the Application. STEPS TO FOLLOW TO REPRODUCE THE PROBLEM : Run the source. Please email me for example test file. (testfile.xml) In case you don't want to email me for the file, here is how to create one: 1) create an testfile.xml in the same directory where you run the code 2) Paste the following: <?xml version='1.0' encoding='utf-8'?> <!--DTD for vocab --> <!DOCTYPE FirstNode [ ELEMENT FirstNode (ChildNode)* ELEMENT ChildNode (#PCDATA) ]> <FirstNode> <ChildNode> <html><body><a name="1"></a> <p><b>concinnity</b></p> <blockquote>concinnity was Word of the Day on <a href="http://www.dictionary.com/wordoftheday/archive/2001/08/18.html">August 18, 2001</a>.</blockquote><br> <table border="0" cellpadding="0" cellspacing="0" width="100%"><tr><td class="src"><a href="/search?q=00-database-info&db=wotd" title="Click for more information about this dictionary">Source</a>: <cite>Dictionary.com Word of the Day</cite></td></tr></table> <a name="2"></a> <TABLE><TR><TD><A NAME="C0548200"><B>con·cin·ni·ty</B></A> <A TITLE="Click for guide to symbols." onClick="ahdpop();return false;" HREF="/help/ahd4/pronkey.html" CLASS="linksrc"><b>Pronunciation Key</b></A> (k<IMG ALT="" SRC="pronkey_files/schwa.gif" height="15" width="6" ALIGN="ABSBOTTOM">n-s<IMG ALT="" SRC="pronkey_files/ibreve.gif" height="15" width="7" ALIGN="ABSBOTTOM">n<IMG ALT="" SRC="pronkey_files/prime.gif" height="22" width="4" ALIGN="ABSBOTTOM"><IMG ALT="" SRC="pronkey_files/ibreve.gif" height="15" width="7" ALIGN^F quot; SRC="pronkey_files/emacr.gif" height="15" width="7" ALIGN="ABSBOTTOM">)<BR> <I>n.</I> <I>pl.</I> <B>con·cin·ni·ties </B><OL><LI> Harmony in the arrangement or interarrangement of parts with respect to a whole.</LI> <LI> Studied elegance and facility in style of expression: “He has what one character calls ‘the gifts of concinnity and concision,’ that deft swipe with a phrase that can be so devastating in children” (Elizabeth Ward). </LI> <LI>An instance of harmonious arrangement or studied elegance and facility.</LI> </OL><BR> <HR ALIGN="left" WIDTH="25%">[From Latin<TT> concinnit<IMG ALT="" SRC="pronkey_files/amacr.gif" height="15" width="7" ALIGN="ABSBOTTOM">s</TT>, from<TT> concinn<IMG ALT="" SRC="pronkey_files/amacr.gif" height="15" width="7" ALIGN="ABSBOTTOM">re</TT>, <I>to put in order</I>, from<TT> concinnus</TT>, <I>deftly joined</I>.]</TD> </TR></TABLE> <a name="3"></a> <b>concinnity</b><br><br> \Con*cin"ni*ty\, n. [L. concinnitas, fr. concinnus skillfully put together, beautiful. Of uncertain origin.] Internal harmony or fitness; mutual adaptation of parts; elegance; -- used chiefly of style of discourse. [R.] <br><br> An exact concinnit ;<table border="0" cellpadding="0" cellspacing="0" width="100%"><tr><td class="src"><a href="/search?q=00-database-info&db=web1913" title="Click for more information about this dictionary">Source</a>: <cite>Webster's Revised Unabridged Dictionary, © 1996, 1998 MICRA, Inc.</cite></td></tr></table> </body></html> </ChildNode> </FirstNode> 3) Repeatedly copy and paste the <ChildNode>...</ChildNode> content for about 196 times inside the <FirstNode>..</FirstNode> When you run, the error happens after reading about 195 ChildNode. You can change line 30 and 31 of source: test.DOMRead(); //test.SAXRead(); to: //test.DOMRead(); test.SAXRead(); to test SAX error. In both cases, an exception was generated. EXPECTED VERSUS ACTUAL BEHAVIOR : No error. Exception when run ERROR MESSAGES/STACK TRACES THAT OCCUR : org.xml.sax.SAXException: Fatal Error: URI=null Line=595: Parser has reached the entity expansion limit "64,000" set by the Application. at TErrorHandler.fatalError(XMLError.java:198) at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3342) at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3333) at org.apache.crimson.parser.Parser2.expandEntityInContent(Parser2.java:2667) at org.apache.crimson.parser.Parser2.maybeReferenceInContent(Parser2.java:2569) at org.apache.crimson.parser.Parser2.content(Parser2.java:1980) at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1654) at org.apache.crimson.parser.Parser2.content(Parser2.java:1926) at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1654) at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:634) at org.apache.crimson.parser.Parser2.parse(Parser2.java:333) at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:448) at org.apache.crimson.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:185) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:76) at XMLError.DOMRead(XMLError.java:101) at XMLError.main(XMLError.java:30) REPRODUCIBILITY : This bug can be reproduced always. ---------- BEGIN SOURCE ---------- import java.util.*; import org.w3c.dom.*; import java.io.*; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.FactoryConfigurationError; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.*; import org.xml.sax.SAXException; import org.xml.sax.SAXParseException; import org.xml.sax.*; import org.xml.sax.helpers.*; import org.w3c.dom.*; import org.w3c.dom.Document; import org.w3c.dom.DOMException; public class XMLError { private String fname = null; public XMLError(String fname) { this.fname = fname; } public static void main(String [] argv){ XMLError test = new XMLError("testfile.xml"); test.DOMRead(); //test.SAXRead(); } public void SAXRead(){ System.out.println("Reading " + fname + "..."); String data = readFile(fname); if(data == null){ System.out.println("There is no such file as " + fname); return; } try{ SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setValidating(true); SAXParser parser = factory.newSAXParser(); //org.xml.sax.helpers.DefaultHandler parser.parse(new ByteArrayInputStream(data.getBytes()), new DefaultHandler(){ private CharArrayWriter contents = new CharArrayWriter(); private int count; public void characters(char[] ch, int start, int length){ contents.write( ch, start, length ); } public void endDocument(){ System.out.println("Finish: " + count); } public void endElement(String uri, String localName, String qName) { if ( qName.equals( "ChildNode" ) ) { count++; String str = contents.toString(); System.out.println("Importing... " + count + " : " + str); } } public void startDocument(){ //contents.reset(); count = 0; } public void startElement(String uri, String localName, String qName, Attributes attributes){ contents.reset(); //System.out.println("The name: " + localName + ", qName: " + qName); } }); }catch(Exception ee){ ee.printStackTrace(); } } public void DOMRead(){ System.out.println("Reading " + fname + "..."); String data = readFile(fname); if(data == null){ System.out.println("There is no such file as " + fname); return; } int count = 0; try { TErrorHandler error = new TErrorHandler(); DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setValidating(true); factory.setIgnoringElementContentWhitespace(true); //factory.setNamespaceAware(true); //factory.setExpandEntityReferences(false); System.out.println("Parsing xml data..."); DocumentBuilder builder = factory.newDocumentBuilder(); builder.setErrorHandler(error); Document document = builder.parse(new ByteArrayInputStream(data.getBytes())); Node node; node = document.getFirstChild(); if(node == null){ return; } System.out.println("Start importing data: "); while(node != null){ if(node.getNodeType() == Node.ELEMENT_NODE){ if("FirstNode".equalsIgnoreCase(node.getNodeName())) break; } node = node.getNextSibling(); } node = node.getFirstChild(); String str = null; boolean done = false; while((node != null) && (!done)){ str = getValue(node); if(str == null) break; node = node.getNextSibling(); count++; if((count % 10) == 0){ System.out.print("."); } } }catch(Exception e){ e.printStackTrace(); } System.out.println("\n\nDone: " + count); } static public String getValue(Node node){ if(node == null) return null; Node node2 = node.getFirstChild(); if(node2 == null){ return ""; } if(node2.getNodeType() != Node.TEXT_NODE) return null; return node2.getNodeValue(); } public static String readFile(String fname){ if((fname == null) || (fname.trim().length() <= 0)){ return null; } BufferedReader in = null; String str; StringBuffer buf = new StringBuffer(); try{ in = new BufferedReader(new FileReader(fname)); while(in.ready()){ str = in.readLine(); if(str == null) break; buf.append(str + "\n"); } in.close(); }catch(IOException e){ //e.printStackTrace(); return null; } return buf.toString(); } } class TErrorHandler implements ErrorHandler { int errNo = 0; String errMessage = ""; public void resetError(){ errNo = 0; errMessage = ""; } public void setError(String mesg){ errNo = 1; if(mesg == null) return; errMessage = errMessage + "\n" + mesg; } TErrorHandler() { } private String getParseExceptionInfo(SAXParseException spe) { String systemId = spe.getSystemId(); if (systemId == null) { systemId = "null"; } String info = "URI=" + systemId + " Line=" + spe.getLineNumber() + ": " + spe.getMessage(); return info; } public void warning(org.xml.sax.SAXParseException sAXParseException) throws org.xml.sax.SAXException { setError("Warning: " + getParseExceptionInfo(sAXParseException)); } public void error(org.xml.sax.SAXParseException sAXParseException) throws org.xml.sax.SAXException { String message = "Error: " + getParseExceptionInfo(sAXParseException); throw new SAXException(message); } public void fatalError(org.xml.sax.SAXParseException sAXParseException) throws org.xml.sax.SAXException { String message = "Fatal Error: " + getParseExceptionInfo(sAXParseException); throw new SAXException(message); } } ---------- END SOURCE ---------- CUSTOMER SUBMITTED WORKAROUND : None (Review ID: 183616) ====================================================================== ###@###.### 2004-07-13
|