FULL PRODUCT VERSION :
A DESCRIPTION OF THE PROBLEM :
The XMLReader implementation loads the CDATA section completely into memory before sending it back to the application event handler. When operating on a very large CDATA section, the JVM will hit an OutOfMemoryError.
The third party Xerces implementation reads and sends back the CDATA section in chunks.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Create an XML document with a CDATA section containing for instance 500MB of BASE64 content. Run the included sample program using -Xmx256m as the max heap size. The program will fail with an OutOfMemoryError because the CDATA section is fully loaded into a memory (custom version of String builder).
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The parser calls back with multiple chunks of the CDATA content, e.g.
characters: start=0 length=76
characters: start=0 length=77
characters: start=0 length=77
characters: start=0 length=77
characters: start=0 length=77
characters: start=0 length=77
characters: start=0 length=77
characters: start=0 length=77
characters: start=0 length=77
characters: start=0 length=77
characters: start=0 length=77
characters: start=0 length=77
characters: start=0 length=77
characters: start=0 length=77
...
ACTUAL -
The parser calls back with the complete CDATA content, e.g.
characters: start=0 length=24138365
ERROR MESSAGES/STACK TRACES THAT OCCUR :
OutOfMemoryError
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.io.*;
import org.xml.sax.*;;
import org.xml.sax.helpers.*;
class Test {
public static void main(String[] args) throws Exception {
XMLReader reader = XMLReaderFactory.createXMLReader();
reader.setContentHandler(new ConsoleHandler());
try (InputStream is = new FileInputStream("test.xml")) {
reader.parse(new InputSource(is));
}
}
static class ConsoleHandler extends DefaultHandler {
@Override
public void characters(char[] ch, int start, int length) throws SAXException {
System.out.printf("characters: start=%d length=%d%n", start, length);
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
The only workaround is to use the third party Xerces implementation.