JDK-8215543 : New line and indentation is added to the content of CDATA sections within XML
  • Type: Bug
  • Component: xml
  • Sub-Component: javax.xml.transform
  • Affected Version: 9,10,11
  • Priority: P3
  • Status: Resolved
  • Resolution: Duplicate
  • OS: generic
  • CPU: x86_64
  • Submitted: 2018-12-17
  • Updated: 2019-10-31
  • Resolved: 2019-10-31
Related Reports
Relates :  
Description
A DESCRIPTION OF THE PROBLEM :
It looks like a bug exists where a new line is added to within CDATA sections when a XSLT is used.

It is not clear if this is the correct place to raise a bug, as I think the issue is in com.sun.org.apache packages, but I can't see where they are maintained or if they are modified to be included in the JVM.

When writing out value 'ABCDEFGHIJKLMNOPQRST985' a new line is added within the value. this can be seen in:
        <p>
            <![CDATA[ABCDEFGHIJKLMNOPQRST
            985]]>
        </p>
yet it should look like:
        <p>
            <![CDATA[ABCDEFGHIJKLMNOPQRST985]]>
        </p>

The problem is easy to reproduce, see below.

I suspect the problem has something to do with:
com.sun.org.apache.xml.internal.util.FastStringBuffer#sendSAXcharacters

The issue happens when the data for a single value is spread over multiple arrays in m_array.

Within com.sun.org.apache.xml.internal.serializer.ToStream#cdata it does adds the new line and indent in the middle of writing the values for a CDATA section. This happens at these lines withing the method:
            if (shouldIndent()) <-- this returns true in the middle of cdata
                indent();

This looks fixed in jdk12 however that is not released yet.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Create a XML with many many values, I think at least 1KB of data is needed. Then write a XSLT to write out all of those values within CDATA sections.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Values within CDATA sections should not have occasional new lines and indents added
ACTUAL -
Values within CDATA sections have occasional new lines and indents added to the values.

---------- BEGIN SOURCE ----------
import static java.nio.charset.StandardCharsets.UTF_8;

import org.junit.Assert;
import org.junit.Test;

import java.io.ByteArrayOutputStream;
import java.io.StringReader;

import javax.xml.transform.Templates;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

/**
 * Domonstrates an issue with new line + indent being added to cdata sections
 * 
 * The issue can be seen in the 2nd cdata section here:
 * 
 * <?xml version="1.0" encoding="UTF-8" standalone="no"?>
 * <root>
 *     <inner>
 *         <p>
 *             <![CDATA[ABCDEFGHIJKLMNOPQRST984]]>
 *         </p>
 *         <p>
 *             <![CDATA[ABCDEFGHIJKLMNOPQRST
 *             985]]>
 *         </p>
 *         <p>
 *             <![CDATA[ABCDEFGHIJKLMNOPQRST986]]>
 *         </p>
 *     </inner>
 * </root>
 * 
 *
 */
public class XMLIndentTest {

    private static final String VALUE_PREFIX = "ABCDEFGHIJKLMNOPQRST";
    
    @Test
    public void test() throws Exception {
        String xml = makeXml();
        Templates templates 
            = TransformerFactory.newInstance().newTemplates(new StreamSource(new StringReader(XSLT)));
        
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        
        templates.newTransformer().transform(new StreamSource(new StringReader(xml)), 
            new StreamResult(bos));
        
        String res = new String(bos.toByteArray(), UTF_8);
        System.out.println(res);
        Assert.assertTrue(res.contains("<![CDATA[ABCDEFGHIJKLMNOPQRST984]]>"));
        Assert.assertTrue(res.contains("<![CDATA[ABCDEFGHIJKLMNOPQRST986]]>"));
        Assert.assertTrue(res.contains("<![CDATA[ABCDEFGHIJKLMNOPQRST985]]>"));
    }
    
    private String makeXml() {
        StringBuffer sb = new StringBuffer();
        sb.append("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n");
        sb.append("<root>\n");
        sb.append("<inner>\n");
        for(int i = 0; i < 984; i++) {
            sb.append("<cd1><v>")
            .append(VALUE_PREFIX)
            .append(i)
            .append("</v></cd1>\n");
        }
        
        for(int i = 984; i < 987; i++) {
            sb.append("<cd><v>")
            .append(VALUE_PREFIX)
            .append(i)
            .append("</v></cd>\n");
        }
        sb.append("</inner>\n");
        sb.append("</root>\n");
        return sb.toString();
    }
    
    private static final String XSLT = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" + 
        "<xsl:stylesheet version=\"2.0\" xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\">\n" + 
        "<xsl:output\n" + 
        "    media-type=\"text/xml\"\n" + 
        "    encoding=\"UTF-8\"\n" + 
        "    method=\"xml\"\n" + 
        "    indent=\"yes\"\n" + 
        "    cdata-section-elements=\"p\"\n" + 
        "    standalone=\"no\" />\n" + 
        "\n" + 
        "<xsl:template match=\"@*|node()\" />\n" + 
        "\n" + 
        "<xsl:template match=\"/\">\n" + 
        "    <root>\n" + 
        "        <inner>\n" + 
        "          <xsl:for-each select=\"root/inner/cd\">\n" + 
        "            <p>" + 
        "                <xsl:value-of select=\"v\" />\n" + 
        "            </p>" + 
        "          </xsl:for-each>\n" + 
        "        </inner>\n" + 
        "    </root>\n" + 
        "</xsl:template>\n" + 
        "</xsl:stylesheet>";
}

---------- END SOURCE ----------

FREQUENCY : always



Comments
Hi Aleksej, I'm assigning this to you for you to evaluate whether you'd want to backport the fix in JDK 12 or not.
03-01-2019

To reproduce the issue, run the attached test case. JDK 8u191 - Pass JDK 9.0.1 - Fail JDK 11 GA - Fail JDK 11.0.1- Fail JDK 12-ea+21 - Pass Output on failed versions: <?xml version="1.0" encoding="UTF-8" standalone="no"?> <root> <inner> <p> <![CDATA[ABCDEFGHIJKLMNOPQRST984]]> </p> <p> <![CDATA[ABCDEFGHIJKLMNOPQRST 985]]> </p> <p> <![CDATA[ABCDEFGHIJKLMNOPQRST986]]> </p> </inner> </root> Output in passed versions: <?xml version="1.0" encoding="UTF-8" standalone="no"?> <root> <inner> <p> <![CDATA[ABCDEFGHIJKLMNOPQRST984]]> </p> <p> <![CDATA[ABCDEFGHIJKLMNOPQRST985]]> </p> <p> <![CDATA[ABCDEFGHIJKLMNOPQRST986]]> </p> </inner> </root>
18-12-2018