JDK-6770436 : Entity callback order differs between Java1.5 and Java1.6
  • Type: Bug
  • Component: xml
  • Sub-Component: org.xml.sax
  • Affected Version: 6
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux
  • CPU: x86
  • Submitted: 2008-11-12
  • Updated: 2015-06-04
  • Resolved: 2014-11-04
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9 b39Fixed
Description
FULL PRODUCT VERSION :
#java -version
java version "1.6.0_10-rc"
Java(TM) SE Runtime Environment (build 1.6.0_10-rc-b28)
Java HotSpot(TM) Server VM (build 11.0-b15, mixed mode)


ADDITIONAL OS VERSION INFORMATION :
Linux danbev-laptop 2.6.24-19-generic #1 SMP Wed Aug 20 22:56:21 UTC 2008 i686 GNU/Linux

A DESCRIPTION OF THE PROBLEM :
When using Java1.5 the following string, "<element> &amp; some more text</element>"  would generate the following callback order:
startDocument
startElement 'element'
characters ' ';
startEntity 'amp'
characters '&'
endEntity 'amp'
characters ' some more text'
endElement 'element'
endDocument

But when the same application is run with Java1.6 order is the following:
startDocument
startElement 'element'
characters ' ';
startEntity 'amp'
endEntity 'amp'
characters '&'
characters ' some more text'
endElement 'element'
endDocument

Notice how the startEntity and endEntity are not interleaved with the characters callback.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Please run the test class pasted in the "Source code for an executable test case:" field.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Expected callback order:
startDocument
startElement 'element'
characters ' ';
startEntity 'amp'
characters '&'
endEntity 'amp'
characters ' some more text'
endElement 'element'
endDocument
ACTUAL -
Actual callback order:
startDocument
startElement 'element'
characters ' ';
startEntity 'amp'
endEntity 'amp'
characters '&'
characters ' some more text'
endElement 'element'
endDocument

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import static org.junit.Assert.assertEquals;
import java.io.IOException;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.List;
import org.junit.Test;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.ext.DefaultHandler2;
import org.xml.sax.helpers.XMLReaderFactory;

/**
 * Simple test to demonstrate differences between Java versions with regard to SAX
 * callback ordering.
 * <p/>
 *
 * @author <a href="mailto:###@###.###">Daniel Bevenius</a>
 *
 */
public class XmlParserTest
{
	@Test
	public void entityCallbackOrderJava() throws SAXException, IOException
	{
        final String input = "<element> &amp; some more text</element>";
        
		final MockContentHandler handler = new MockContentHandler();
		final XMLReader xmlReader = XMLReaderFactory.createXMLReader();
		
		xmlReader.setContentHandler(handler);
                xmlReader.setProperty("http://xml.org/sax/properties/lexical-handler", handler);
		
		xmlReader.parse(new InputSource(new StringReader(input)));
		
		final List<String> events = handler.getEvents();
		//assertJava5CallbackOrder(events);
		assertJava6CallbackOrder(events);
	}
	
	private void assertJava6CallbackOrder(final List<String> events)
	{
		assertEquals("startDocument", events.get(0));
		assertEquals("startElement 'element'", events.get(1));
		assertEquals("characters ' '", events.get(2));
		assertEquals("startEntity 'amp'", events.get(3));
		assertEquals("endEntity 'amp'", events.get(4));
		assertEquals("characters '&'", events.get(5));
		assertEquals("characters ' some more text'", events.get(6));
		assertEquals("endElement 'element'", events.get(7));
		assertEquals("endDocument", events.get(8));
	}
	
	private void assertJava5CallbackOrder(final List<String> events)
	{
		assertEquals("startDocument", events.get(0));
		assertEquals("startElement 'element'", events.get(1));
		assertEquals("characters ' '", events.get(2));
		assertEquals("startEntity 'amp'", events.get(3));
		assertEquals("characters '&'", events.get(4));
		assertEquals("endEntity 'amp'", events.get(5));
		assertEquals("characters ' some more text'", events.get(6));
		assertEquals("endElement 'element'", events.get(7));
		assertEquals("endDocument", events.get(8));
	}
	
	private class MockContentHandler extends DefaultHandler2
	{
		private List<String> events;
		
		public List<String> getEvents()
		{
			return events;
		}

		@Override
		public void startDocument() throws SAXException
		{
			events = new ArrayList<String>();
			events.add("startDocument");
		}
		
		@Override
		public void characters( char[] ch, int start, int length ) throws SAXException
		{
			events.add("characters '" + new String(ch, start, length) + "'");
		}
		
		@Override
		public void startElement( String uri, String localName, String name, Attributes atts ) throws SAXException
		{
			events.add("startElement '" + name + "'");
		}
		
		@Override
		public void endElement( String uri, String localName, String name ) throws SAXException
		{
			events.add("endElement '" + name +"'");
		}

		@Override
		public void endDocument() throws SAXException
		{
			events.add("endDocument");
		}
		
		@Override
		public void startEntity( String name ) throws SAXException
		{
			events.add("startEntity '" + name + "'");
		}
		
		@Override
		public void endEntity( String name ) throws SAXException
		{
			events.add("endEntity '" + name + "'");
		}
	}
}

---------- END SOURCE ----------

Release Regression From : 5.0
The above release value was the last known release where this 
bug was not reproducible. Since then there has been a regression.

Comments
EVALUATION Accepted, by popular votes. This issue seems to exist in the original Apache update (revision 1.2), however, the code is not the same as that in Xerces. Can't really find the trace of the issue.
03-09-2009