JDK-6690015 : XML Parse attributes with amp gt; in attribute value causes wrong order
  • Type: Bug
  • Component: xml
  • Sub-Component: org.xml.sax
  • Affected Version: 6
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2008-04-17
  • Updated: 2012-04-25
  • Resolved: 2009-05-20
Related Reports
Duplicate :  
Description
FULL PRODUCT VERSION :
:~$ java -version
java version "1.6.0_03"
Java(TM) SE Runtime Environment (build 1.6.0_03-b05)
Java HotSpot(TM) Client VM (build 1.6.0_03-b05, mixed mode, sharing)

ADDITIONAL OS VERSION INFORMATION :
Windows XP service pack 2
Linux <hostname> 2.6.22-14-generic #1 SMP Tue Feb 12 07:42:25 UTC 2008 i686 GNU/Linux

A DESCRIPTION OF THE PROBLEM :
Problem occurs dependent on at least two factors:

1. The number of attributes in the parsed element
2. The existence of allowed entities, eg. amp gt;  (ampersand not actually written here)

Similar (but not the same) bug found in bug database search, 6567432, but that was declared to be fixed for java 6 update 3, and I am using Java 6 update 5.
===================================================

Problem:

When an XML element is parsed, and that element has:
    1. enough attributes (my tests were using 16 attributes)
    2. attributes which values contain allowed entities, eg. amp gt;
the retrieval of attributes results in:
    1. mixed up attribute name/ attribute value pairs
    2. sometimes attribute values merging with attribute names, resulting in a generally confused output.
    3. absolutely NO exception or error is ever thrown. Only wrong output is the symptom.

This bug does NOT occur in java 1.4.2


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
compile and run the provided test application (against the provided XML) with both java 1.4, then java 6 to compare the results (it is required to save the provided XML as a file, and change the filename in the example to point to this file).
Java 1.4 results in correct output,
Java 6 results in garbage.

package astraia.test;

import java.io.FileInputStream;

import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
 
public class Example
{
    public static void main(String[] argv)
    {
		try
		{
			FileInputStream fis = new FileInputStream("/home/sean/Desktop/chris/lessNoInternat.xml");
 
	        Document doc = DocumentBuilderFactory.newInstance()
	        .newDocumentBuilder()
	        .parse(new InputSource(fis));
			Element root = doc.getDocumentElement();
			NodeList textnodes = root.getElementsByTagName("text");
			int len = textnodes.getLength();
			int index = 0;
			int attindex = 0;
			int attrlen = 0;
			NamedNodeMap attrs = null;
 
			while (index<len)
			{
				Element te = (Element)textnodes.item(index);
				attrs = te.getAttributes();
				attrlen = attrs.getLength();
				attindex = 0;
				Node node = null;
 
				while (attindex<attrlen)
				{
					node = attrs.item(attindex);
					System.out.println("attr: "+node.getNodeName()+ " is shown holding value: " + node.getNodeValue());
					attindex++;
				}
				index++;
				System.out.println("-------------");
			}
	        fis.close();
		}
		catch(Exception e)
		{
			System.out.println("we've had an exception, type "+ e);
		}
	}
}

xml file:

<?xml version="1.0" encoding="UTF-8"?>
<block>
<lang>
<text dna="8233" ro="hello, and i'll type some normal characters in (&gt;=1.5 mm) ro" it="here to make sure international characters don't play a part(&gt;=1.5mm) it" tr="make sure international characters don't play a part (&gt;=1.5 mm) tr" pt_br="make sure international characters don't play a part (&gt;=1,5 mm) pt_br" de="make sure international characters don't play a part (&gt;=1,5 mm) de" el="make sure international characters don't play a part (&gt;= 1.5 mm) el" zh_cn="make sure international characters don't play a part��&gt;= 1.5 mm�� zh_cn" pt="make sure international characters don't play a part (&gt;=1,5 mm) pt" bg="make sure international characters don't play a part (&gt;= 1.5 mm) bg" fr="make sure international characters don't play a part (&gt;= 1,5 mm) fr" en="make sure international characters don't play a part (&gt;= 1.5 mm) en" ru="make sure international characters don't play a part (&gt;=1.5 ����) ru" es="make sure international characters don't play a part (&gt;=1.5 mm) es" ja="make sure international characters don't play a part��&gt;=1.5mm�� ja" nl="make sure international characters don't play a part (&gt;= 1,5 mm) nl" />
</lang>
</block>

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -

The expected results are that when I iterate through the attributes and print out their name and values, they match what I see when i look at the xml file.
Below, we see a run of the application using java 1.4
As you can see, each line shows you on the left what attribute we are currently looking at,
followed by the value it is shown holding.

attr:<attribute-name>: is shown holding the value: <attribute-value>


attr: dna is shown holding value: 8233
attr: ro is shown holding value: hello, and i'll type some normal characters in (>=1.5 mm) ro
attr: it is shown holding value: here to make sure international characters don't play a part(>=1.5mm) it
attr: tr is shown holding value: make sure international characters don't play a part (>=1.5 mm) tr
attr: pt_br is shown holding value: make sure international characters don't play a part (>=1,5 mm) pt_br
attr: de is shown holding value: make sure international characters don't play a part (>=1,5 mm) de
attr: el is shown holding value: make sure international characters don't play a part (>= 1.5 mm) el
attr: zh_cn is shown holding value: make sure international characters don't play a part��>= 1.5 mm�� zh_cn
attr: pt is shown holding value: make sure international characters don't play a part (>=1,5 mm) pt
attr: bg is shown holding value: make sure international characters don't play a part (>= 1.5 mm) bg
attr: fr is shown holding value: make sure international characters don't play a part (>= 1,5 mm) fr
attr: en is shown holding value: make sure international characters don't play a part (>= 1.5 mm) en
attr: ru is shown holding value: make sure international characters don't play a part (>=1.5 ����) ru
attr: es is shown holding value: make sure international characters don't play a part (>=1.5 mm) es
attr: ja is shown holding value: make sure international characters don't play a part��>=1.5mm�� ja
attr: nl is shown holding value: make sure international characters don't play a part (>= 1,5 mm) nl
-------------

ACTUAL -
The actual results, as seen when this example program is run through Java 6, update 5
shows the attribute names, and values a little garbled together sometimes, and mixed up, so that, for example, the value of attribute name 'en' no longer matches the original content, but the value of another attribute + the name of another attribute appended at the end.


As you can see, each line shows you on the left what attribute we are currently looking at,
followed by the value it is shown holding.

attr:<attribute-name>: is shown holding the value: <attribute-value>


attr: bg is shown holding value: make sure international characters don't play a part (>= 1,5 mm) fr
attr: de is shown holding value: make sure international characters don't play a part (>=1,5 mm) de
attr: dna is shown holding value: 8233
attr: el is shown holding value: make sure international characters don't play a part (>= 1.5 mm) el
attr: en is shown holding value: make sure international characters don't play a part (>=1.5 ����) run
attr: es is shown holding value: make sure international characters don't play a part��>=1.5mm�� jaes
attr: fr is shown holding value: make sure international characters don't play a part (>= 1,5 mm) fr
attr: it is shown holding value: here to make sure international characters don't play a part(>=1.5mm) it
attr: ja is shown holding value: make sure international characters don't play a part��>=1.5mm�� ja
attr: nl is shown holding value: make sure international characters don't play a part (>= 1,5 mm) nl
attr: pt is shown holding value: make sure international characters don't play a part (>=1,5 mm) pt
attr: pt_br is shown holding value: make sure international characters don't play a part (>=1,5 mm) pt_br
attr: ro is shown holding value: hello, and i'll type some normal characters in (>=1.5 mm) ro
attr: ru is shown holding value: make sure international characters don't play a part (>=1.5 ����) ru
attr: tr is shown holding value: make sure international characters don't play a part (>=1.5 mm) tr
attr: zh_cn is shown holding value: make sure international characters don't play a part (>=1,5 mm) pt_cn
-------------


ERROR MESSAGES/STACK TRACES THAT OCCUR :
No error message or exception

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
package astraia.test;

import java.io.FileInputStream;

import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
 
public class Example
{
    public static void main(String[] argv)
    {
		try
		{
			FileInputStream fis = new FileInputStream("/home/sean/Desktop/chris/lessNoInternat.xml");
 
	        Document doc = DocumentBuilderFactory.newInstance()
	        .newDocumentBuilder()
	        .parse(new InputSource(fis));
			Element root = doc.getDocumentElement();
			NodeList textnodes = root.getElementsByTagName("text");
			int len = textnodes.getLength();
			int index = 0;
			int attindex = 0;
			int attrlen = 0;
			NamedNodeMap attrs = null;
 
			while (index<len)
			{
				Element te = (Element)textnodes.item(index);
				attrs = te.getAttributes();
				attrlen = attrs.getLength();
				attindex = 0;
				Node node = null;
 
				while (attindex<attrlen)
				{
					node = attrs.item(attindex);
					System.out.println("attr: "+node.getNodeName()+ " is shown holding value: " + node.getNodeValue());
					attindex++;
				}
				index++;
				System.out.println("-------------");
			}
	        fis.close();
		}
		catch(Exception e)
		{
			System.out.println("we've had an exception, type "+ e);
		}
	}
}




xml file:

<?xml version="1.0" encoding="UTF-8"?>
<block>
<lang>
<text dna="8233" ro="hello, and i'll type some normal characters in (&gt;=1.5 mm) ro" it="here to make sure international characters don't play a part(&gt;=1.5mm) it" tr="make sure international characters don't play a part (&gt;=1.5 mm) tr" pt_br="make sure international characters don't play a part (&gt;=1,5 mm) pt_br" de="make sure international characters don't play a part (&gt;=1,5 mm) de" el="make sure international characters don't play a part (&gt;= 1.5 mm) el" zh_cn="make sure international characters don't play a part��&gt;= 1.5 mm�� zh_cn" pt="make sure international characters don't play a part (&gt;=1,5 mm) pt" bg="make sure international characters don't play a part (&gt;= 1.5 mm) bg" fr="make sure international characters don't play a part (&gt;= 1,5 mm) fr" en="make sure international characters don't play a part (&gt;= 1.5 mm) en" ru="make sure international characters don't play a part (&gt;=1.5 ����) ru" es="make sure international characters don't play a part (&gt;=1.5 mm) es" ja="make sure international characters don't play a part��&gt;=1.5mm�� ja" nl="make sure international characters don't play a part (&gt;= 1,5 mm) nl" />
</lang>
</block>
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
no workaround known

Release Regression From : 5.0
The above release value was the last known release where this 
bug was not reproducible. Since then there has been a regression.

Comments
EVALUATION Thanks for the comments and votes on this issue. The suggested code change is the same as that made in the patch for 6518733. Here's the change: https://jaxp-sources.dev.java.net/source/browse/jaxp-sources/xml-xerces/java/src/com/sun/org/apache/xerces/internal/impl/XMLScanner.java?r1=1.7&r2=1.8 I have verified using the submitted test and xml file that the issue had been fixed. Unfortunately, the patch for 6518733 did not get into jdk6 until update 14. I appologize for the inconvenience. After JavaOne, we plan to improve the process and bring JDK7 and 6 in sync with JAXP to resolve the problem that have affected users quite often where jaxp fixes were not integrated into the JDK. Please also note that you may download the latest jaxp build from java.net and use the endorsed mechanism or place the jaxp-ri jar file on bootclasspath to override the jaxp implementation in jdk.
20-05-2009