JDK-8080085 : ArrayIndexOutOfBoundsException at ...xerces.internal.impl.io.UTF8Reader.read
  • Type: Bug
  • Component: xml
  • Sub-Component: javax.xml.parsers
  • Affected Version: 7u80,8u45,8u60
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • OS: windows_7
  • CPU: x86_64
  • Submitted: 2015-05-12
  • Updated: 2015-07-31
  • Resolved: 2015-06-11
Related Reports
Duplicate :  
Description
FULL PRODUCT VERSION :
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b15)
Java HotSpot(TM) Client VM (build 25.45-b02, mixed mode, sharing)

ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows [Version 6.1.7601]
SunOS xxxxx 5.10 Generic_147440-07 sun4v sparc sun4v
Linux yyyyy 2.6.18-348.el5xen #1 SMP Wed Nov 28 22:04:26 EST 2012 i686 i686 i386 GNU/Linux

A DESCRIPTION OF THE PROBLEM :
There appears to be a bug in the Apache XERCES UTF8Reader function included in Java, that can happen when the input file contains 4-byte UTF8 characters.

See https://issues.apache.org/jira/browse/XERCESJ-1257

Apparently this bug has existed in the Apache XERCES source since 2007 and still has not been fixed.

In 2007 Robert Stojnic posted a patch, which still had a problem in it. So Michael Glavassevich committed a different but bad fix (and closed the issue), which then caused a different "org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence" to occur (and the issue was reopened).

Robert Stojnic gave a slight modification to his original patch in a later comment (09/Jul/07) that gave a correct fix, but he neglected to update his posted UTF8Reader.patch file or attach a new patch file, and so the issue has just been languishing in limbo since then and the bug in the Apache XERCES sources has never been properly fixed (and now it's 2015 and the bug was originally reported in 2007)!

Apparently others using the "broken" Apache source code have resorted to patching themselves the Apache source that they are incorporating into their products. I don't know how to get the Apache XERCES "people" to fix this bug (that's been around now since 2007), so I'm suggesting that perhaps the Oracle Java developers can do the same thing and patch the XERCES code included with Oracle's Java, unless you know how to get the Apache XERCES people to fix it (and then incorporate the fix).

This is the revised patch (from Robert Stojnic) that fixes the problem. Again, it's slightly revised from the UTF8Reader.patch attachment posted to the referenced Apache XERCES bug posting, but Robert did propose the change to his patch (in a comment at 09/Jul/07 13:07 ) and at https://issues.apache.org/jira/browse/LUCENE-1591 in a comment at 18/Apr/09 12:13 by Michael McCandless, Michael indicated that this patch was used to fix the bug in the XERCES sources included in the LUCENE product:
--- src/org/apache/xerces/impl/io/UTF8Reader.java       2006-11-23 00:36:53.000000000 +0100
+++ ../../xerces-2_9_0/src/org/apache/xerces/impl/io/UTF8Reader.java    2007-06-28 02:02:44.000000000 +0200
@@ -534,6 +534,16 @@
                     invalidByte(4, 4, b2);
                 }

+                // check if output buffer is large enough to hold 2 surrogate chars
+                if(out + 1 >= offset + length ){
+                    fBuffer[0] = (byte)b0;
+                    fBuffer[1] = (byte)b1;
+                    fBuffer[2] = (byte)b2;
+                    fBuffer[3] = (byte)b3;
+                    fOffset = 4;
+                    return out - offset;
+                }
+
                 // decode bytes into surrogate characters
                 int uuuuu = ((b0 << 2) & 0x001C) | ((b1 >> 4) & 0x0003);
                 if (uuuuu > 0x10) {

ADDITIONAL REGRESSION INFORMATION: 
Problem does not occur in the following version of Java:
Java version "1.5.0_12"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_12-b04)
Java HotSpot(TM) Client VM (build 1.5.0_12-b04, mixed mode, sharing)

Problem seems to start to occur with Java version 1.6

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
This is the java command that causes the ArrayIndexOutOfBoundsException in UTF8Reader:
java -Dxml.catalog.files=catalog.xml -cp xpp.jar:saxon9pe.jar:resolver.jar com.x
yenterprise.xpp.xslt.XppTransform -x:org.apache.xml.resolver.tools.ResolvingXMLR
eader -y:org.apache.xml.resolver.tools.ResolvingXMLReader -r:org.apache.xml.reso
lver.tools.CatalogResolver -s:divxml.xml -xsl:basic.xsl -o:output.xml

I don't see any way to attach the necessary files to reproduce the error? I need to somehow get you the catalog.xml, xpp.jar, saxon9pe.jar, resolver.jar, divxml.xml, and basic.xsl files.

Due to the nature of the bug, you need an "exact" (divxml.xml) input file that causes the problem to occur. Our example divxml.xml input file has a number of 4-byte UTF8 characters in it. But any modification to the input file, including changing from DOS line-endings to UNIX line endings, will prevent the error from occurring.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
A good output.xml result without getting an exception.
ACTUAL -
Noted exception in the UTF8Reader function.

ERROR MESSAGES/STACK TRACES THAT OCCUR :
Warning: at xsl:stylesheet on line 1 column 80 of basic.xsl:
  Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor
java.lang.ArrayIndexOutOfBoundsException: 8192
        at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanLiteral(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanAttribute(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
        at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
        at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
        at org.xml.sax.helpers.XMLFilterImpl.parse(Unknown Source)
        at org.apache.xml.resolver.tools.ResolvingXMLFilter.parse(ResolvingXMLFilter.java:141)
        at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:397)
        at net.sf.saxon.event.Sender.send(Sender.java:156)
        at net.sf.saxon.Controller.transform(Controller.java:1689)
        at net.sf.saxon.Transform.processFile(Transform.java:1157)
        at net.sf.saxon.Transform.doTransform(Transform.java:752)
        at com.xyenterprise.xpp.xslt.XppTransform.main(XppTransform.java:58)
Fatal error during transformation: java.lang.ArrayIndexOutOfBoundsException: 8192

REPRODUCIBILITY :
This bug can be reproduced often.

---------- BEGIN SOURCE ----------
 I need to somehow get you the catalog.xml, xpp.jar, saxon9pe.jar, resolver.jar, divxml.xml, and basic.xsl files for the java command that fails (detailed in the Steps to Reproduce).

Due to the nature of the bug, you need an "exact" (divxml.xml) input file that causes the problem to occur. Our example divxml.xml input file has a number of 4-byte UTF8 characters in it. But any modification to the input file, including changing from DOS line-endings to UNIX line endings, will prevent the error from occurring.

How do I get those files to you?
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
One workaround that we've found is to change the 4-byte UTF8 characters in the input file into numeric character entities. Then the buffer boundary problem in the UTF8Reader function does not occur.


Comments
Thanks Pardeep. It's very nice to hear from customers.
31-07-2015

Received following confirmation from the submitter: ============================================ On 7/7/2015 12:28 AM, ........ wrote: > > Hi ......, > > > > Thank you very much for the detailed information (via the JDK-7156085 link). > > > > I will pass onto our customer support the information about the future-fixed-in versions for JDK 7 (7u91 b01 and higher) and JDK 8 (8u60 b20 and higher), which I assume at some point will be as GA (General Availability) releases (versus only as EA or Early Adopter releases), and they can inform our customers that run into this problem. > > > > If there���s anything on your end still open for this Incident Report, please feel free to close it as resolved. J > > > > > From: ........... > Sent: Thursday, July 02, 2015 12:24 AM > To: .......... > Cc:......... > Subject: Re: ......... : ArrayIndexOutOfBoundsException at ...xerces.internal.impl.io.UTF8Reader.read > > > > Hi ............., > > As per the latest update, the issue is resolved with a backport of https://bugs.openjdk.java.net/browse/JDK-7156085 to JDK8. > You can confirm this with JDK 8u60 ea b20 and onwards. > JDK 8u60 ea: > https://jdk8.java.net/download.html > > JDK 9 ea (b70): > https://jdk9.java.net/download/ > Note: ea version of JDK 9 is currently available > > > Hope this helps. > > Regards, >.......... ===================================================================
31-07-2015

The issue was resolved by backporting of JDK-7156085 to JDK8
11-06-2015

This issue will be resolved, as was suggested by Joe, by backporting JDK-7156085 to JDK8 and 7
03-06-2015

Aleksej, please evaluate whether or not you'd want to backport the patch for JDK-7156085. Thanks.
03-06-2015

1. Run the attached test case as shared by the submitter. ------------------------------------------------------------------------------ - UTF8ReaderBug.java - contains small sample of code that reproduces the issue. - demobug.jar.txt file (which is a demobug.jar file with .txt extension added) that can just be used directly (after removing the .txt extension) with a ���java -cp demobug.jar UTF8ReaderBUg��� command to reproduce the issue. 2. Steps to reproduce: ------------------------------ a) These are the steps I used to create the attached jar file from the attached java file: javac -d . UTF8ReaderBug.java jar cvf demobug.jar UTF8ReaderBug.class b) This is the command to reproduce the exception error: java -cp demobug.jar UTF8ReaderBug 3. Checked this on Windows 7 and Oracle Linux 6.5 for JDK 8, 8u45, 8u60 ea b17, 9 ea b66. ----------------------------------------------------------------------------- 7u80: Fail 8: Fail 8u45: Fail 8u60 ea b17: Fail 9 ea b66: OK ------------------------------------------------------------------------------ 4. Output (with JDK 8u45): ------------------------------------------------------------------------------- > java -cp demobug.jar UTF8ReaderBug Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 8192 at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:546) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1743) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntityScanner.java:1413) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2823) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:649) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:333) at UTF8ReaderBug.sendToParser(UTF8ReaderBug.java:30) at UTF8ReaderBug.main(UTF8ReaderBug.java:22) ------------------------------------------------------------------------------- 5. Conclusion: The issue is reproducible with JDK 7u80, 8u45 and 8u60 ea b17. However, it seems fixed in JDK 9 ea (confirmed with b66) - JDK-7156085.
03-06-2015