JDK-6259810 : HTML Serializer puts no space between public and system identifiers in DOCTYPE
  • Type: Bug
  • Component: xml
  • Sub-Component: javax.xml.transform
  • Affected Version: jwsdp-1.4
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: windows_2000
  • CPU: x86
  • Submitted: 2005-04-22
  • Updated: 2012-04-25
  • Resolved: 2005-12-10
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6
6 b59Fixed
Description
FULL PRODUCT VERSION :
java version "1.3.1_06"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1_06-b01)
Java HotSpot(TM) Client VM (build 1.3.1_06-b01, mixed mode)

ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows 2000 [Version 5.00.2195]
SunOS kiboko 5.8 Generic_108528-19 sun4u sparc SUNW,Ultra-80

A DESCRIPTION OF THE PROBLEM :
The HTML serializer doesn't leave a space between the public and system
doctypes. This is a known Bug in Xalan 2.6 and should be fixed in the CVS as of
this writing. See:
<http://issues.apache.org/jira/secure/ViewIssue.jspa?key=XALANJ-1910> (Bugzilla
ID 30142)

HTML transforms using JWSDP1.5 result in invalid HTML which is being rejected by some browsers.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run executable Java test case attached.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
<!DOCTYPE HTML PUBLIC "//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<body>hello</body>
</html>
ACTUAL -
<!DOCTYPE HTML PUBLIC "//W3C//DTD HTML 4.01
Transitional//EN""http://www.w3.org/TR/html4/loose.dtd">
<html>
<body>hello</body>
</html>

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import org.xml.sax.XMLReader;
import org.xml.sax.InputSource;

import javax.xml.transform.TransformerFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.sax.SAXTransformerFactory;
import javax.xml.transform.sax.TransformerHandler;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.SAXParser;
import java.io.StringReader;
import java.io.StringWriter;

public class SerializeBug {
    public static void main(String[] args) throws Exception {
        String input = "<html><body>hello</body></html>";

        SAXTransformerFactory transformerFactory = (SAXTransformerFactory)TransformerFactory.newInstance();
        TransformerHandler transformerHandler = transformerFactory.newTransformerHandler();

        transformerHandler.getTransformer().setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, "//W3C//DTD HTML 4.01 Transitional//EN");
        transformerHandler.getTransformer().setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "http://www.w3.org/TR/html4/loose.dtd");
        transformerHandler.getTransformer().setOutputProperty(OutputKeys.METHOD, "html");

        StringWriter writer = new StringWriter();
        StreamResult result = new StreamResult(writer);
        transformerHandler.setResult(result);

        SAXParserFactory parserFactory = SAXParserFactory.newInstance();
        SAXParser parser = parserFactory.newSAXParser();
        XMLReader xmlReader = parser.getXMLReader();

        xmlReader.setContentHandler(transformerHandler);

        InputSource is = new InputSource(new StringReader(input));
        parser.getXMLReader().parse(is);

        System.out.println(writer.toString());
    }
}
---------- END SOURCE ----------
###@###.### 2005-04-22 00:00:55 GMT

Comments
EVALUATION fixed as part of Apache sync: $ java -version java version "1.6.0-rc" Java(TM) 2 Runtime Environment, Standard Edition (build 1.6.0-rc-b61) Java HotSpot(TM) Client VM (build 1.6.0-rc-b61, mixed mode, sharing) $ java SerializeBug <!DOCTYPE html PUBLIC "//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html><body>hello</body></html>
10-12-2005