JDK-8140747 : Data corruption when parsing XML using StAX/Xerces
  • Type: Bug
  • Component: xml
  • Sub-Component: jaxp
  • Affected Version: 7u71
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • OS: linux
  • CPU: x86
  • Submitted: 2015-10-01
  • Updated: 2016-04-05
  • Resolved: 2016-04-05
Related Reports
Duplicate :  
Description
FULL PRODUCT VERSION :
java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)

java version "1.7.0_72"
Java(TM) SE Runtime Environment (build 1.7.0_72-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode)

java version "1.8.0_20"
Java(TM) SE Runtime Environment (build 1.8.0_20-b26)
Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode)

java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)


ADDITIONAL OS VERSION INFORMATION :
Linux hwd 3.16.3 #10 SMP PREEMPT Sun Sep 28 00:13:58 PDT 2014 x86_64 GNU/Linux

A DESCRIPTION OF THE PROBLEM :
When parsing XML using the StAX API, data can be corrupted after each invocation of "read(byte[], int, int)" on the underlying InputStream, depending on how many bytes are actually read.
The Xerces implementation seems to overwrite its own internal buffer, leading to corrupted/inconsistent data. The bug is silent, no exception is thrown.

This is currently affecting the following versions of the JREs:
- 7u71 
- 7u72 
- 8u20 
- 8u25

7u67 and 8u11 are not affected.

REGRESSION.  Last worked in version 7u67

ADDITIONAL REGRESSION INFORMATION: 
java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)

java version "1.7.0_72"
Java(TM) SE Runtime Environment (build 1.7.0_72-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode)

java version "1.8.0_20"
Java(TM) SE Runtime Environment (build 1.8.0_20-b26)
Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode)

java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Execute the provided repro case.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
It should print "rugs"
ACTUAL -
It prints "bugs" when using the affected JREs

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.io.ByteArrayInputStream;
import java.io.FilterInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.charset.Charset;

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamReader;

/*
 * Correct output (7u67,8u11)
 * rugs
 * 
 * Incorrect output (7u71,7u72,8u20,8u25)
 * bugs
 */
public class XmlReaderBug {

    private static final int BYTES_PER_READ = 6;

    private static final String XML =
        "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
        "<He likes=\"rugs\" because=\"they really tie the room together\"/>";

    public static void main(String[] args) throws Exception {
        final InputStream xmlStream = new ByteArrayInputStream(XML.getBytes(Charset.forName("UTF-8")));
        final InputStream throttledXmlStream = new ThrottledInputStream(xmlStream, BYTES_PER_READ);

        final XMLInputFactory xmlFactory = XMLInputFactory.newInstance();
        final XMLStreamReader xmlStreamReader = xmlFactory.createXMLStreamReader(throttledXmlStream);
        xmlStreamReader.next();

        // bugs or rugs?
        System.out.println(xmlStreamReader.getAttributeValue(null, "likes"));
    }

    // An InputStream implementation that limits the number of bytes read by read(byte[], int, int)
    private static class ThrottledInputStream extends FilterInputStream {
        private final int bytesPerRead;

        public ThrottledInputStream(InputStream stream, int bytesPerRead) throws Exception {
            super(stream);
            this.bytesPerRead = bytesPerRead;
        }

        @Override
        public int read(byte[] b, int off, int len) throws IOException {
            if (off < 0 || len < 0 || len > b.length - off) {
                throw new IndexOutOfBoundsException();
            } else if (len == 0) {
                return 0;
            }

            // Limit bytes read
            int bytesToRead = Math.min(bytesPerRead, len);

            // Ensure deterministic behavior (similar to org.apache.commons.io.IOUtils.read)
            // Useless for this test case, but convenient for consistently reproducing
            // the bug with other stream implementations
            int totalBytesRead = 0;
            int bytesRead = 0;
            do {
                bytesRead = Math.max(0, in.read(b, off + totalBytesRead, bytesToRead));
                bytesToRead -= bytesRead;
                totalBytesRead += bytesRead;
            } while (bytesRead > 0);

            // No more bytes
            if (totalBytesRead == 0) {
                return -1;
            }

            return totalBytesRead;
        }
    }	
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
- Do not use the affected versions of the JRE
- Use InputStreams that do not return too few bytes at a time seems to make the issue to vanish. However, I am not sure if this is really the case, or if it just makes the issue more unlikely to happen.
Since the bug is silent, and given that most InputStreams do not make any guarantees on how many bytes are actually read at each invocation of read(), I would only recommend to stay on earlier versions of the JDK




Comments
Attached Testcase executed on JDK 7u80, 8 , 8u60, 8u66 , 8u72 and 9ea B85. Reproducible on none. Moving this to Could not reproduce.
29-10-2015