JDK-7156085 : ArrayIndexOutOfBoundsException throws in UTF8Reader of SAXParser
  • Type: Bug
  • Component: xml
  • Sub-Component: javax.xml.parsers
  • Affected Version: 7u3,9
  • Priority: P5
  • Status: Resolved
  • Resolution: Fixed
  • OS: windows_7
  • CPU: x86
  • Submitted: 2012-03-22
  • Updated: 2016-08-18
  • Resolved: 2014-10-30
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7 JDK 8 JDK 9
7u91Fixed 8u60Fixed 9 b38Fixed
Related Reports
Duplicate :  
Description
Receive the follow exception with the SAXParser on parsing the XML file at 
http://download.wikimedia.org/enwiktionary/latest/enwiktionary-latest-pages-articles.xml.bz2

 
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 8192
                at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:546)
                at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1750)
                at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.arrangeCapacity(XMLEntityScanner.java:1626)
                at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipString(XMLEntityScanner.java:1664)
                at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1707)
                at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2898)
                at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:607)
                at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:488)
                at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:835)
                at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
                at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123)
                at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1210)
                at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:568)
                at org.xml.sax.helpers.ParserAdapter.parse(ParserAdapter.java:429)
                at com.inet.jorthodictionaries.Parser.<init>(Parser.java:63)
                at com.inet.jorthodictionaries.BookGenerator.start(BookGenerator.java:94)
                at com.inet.jorthodictionaries.BookGenerator.main(BookGenerator.java:72)


This problem occur with Java 6 and Java 7.


The code look like:


System.setProperty("entityExpansionLimit", "100000000");
InputSource input = new InputSource(stream);
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
ParserAdapter pa = new ParserAdapter(sp.getParser());
pa.setContentHandler(this);
pa.parse(input);


The completely code can you find in the public repository at:

http://jortho.svn.sourceforge.net/viewvc/jortho/trunk/JOrtho/src/com/inet/jorthodictionaries/Parser.java?revision=241&view=markup

Comments
Proposed fix: http://cr.openjdk.java.net/~martin/webrevs/openjdk9/xerces-UTF8Reader-supplementary-characters/
28-10-2014

This appears to be a Xerces bug, a duplicate of the unresolved upstream bug XERCESJ-1257 buffer overflow in UTF8Reader for characters out of BMP https://issues.apache.org/jira/browse/XERCESJ-1257 See my comment on that bug.
28-10-2014

EVALUATION Working on fix for 7u6.
15-06-2012

EVALUATION The xml file is oever 2GB. Asking for more information regarding the xml log file.
12-06-2012