Bug ID: JDK-8028616 Htmleditorkit parser doesn't handle leading slash (/)

Type: Bug
Component: client-libs
Sub-Component: javax.swing
Affected Version: 6u41

Priority: P2
Status: Resolved
Resolution: Fixed

Submitted: 2013-11-19
Updated: 2014-10-15
Resolved: 2014-01-16

JDK 6	JDK 7	JDK 8	JDK 9
6u81Fixed	7u72Fixed	8u20Fixed	9 b06Fixed

Issue with Java version 6u41

Using HTMLEditorKit to parse pre-existing HTML. If we parse this
document, it works as expected:
<body>/ at start inside body is okay</body>

But if we don't have the <body> tag, then we get only a handleEndOfLineString
callback, and do not get the actual text. This is the example document:
/ at start is bad

Testcase:
import javax.swing.text.MutableAttributeSet;
import javax.swing.text.html.HTML;
import javax.swing.text.html.HTMLDocument;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.parser.DTD;
import javax.swing.text.html.parser.DocumentParser;
import javax.swing.text.html.parser.Entity;
import javax.swing.text.html.parser.ParserDelegator;
import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;

/**
 * Bug in Parser when the document starts with a slash.
 */
public class ParserTest {
    public static void main(String[] args) throws IOException {

        doTest( "/ at start is bad" );
        doTest( "<body>/ at start inside body is okay</body>" );
    }

    private static void doTest(String text) throws IOException {
        System.out.println( "doTest: " + text );
        ParserCB cb = new ParserCB();
        HTMLEditorKit htmlKit = new HTMLEditorKit();
        HTMLDocument htmlDoc = (HTMLDocument)
htmlKit.createDefaultDocument();

        htmlDoc.getParser().parse(new StringReader(text), cb, true);
        System.out.println();
    }

    private static class ParserCB extends HTMLEditorKit.ParserCallback {
        @Override
        public void handleComment(char[] data, int pos) {
            System.out.println("handleComment: " + new String(data) );
        }

        @Override
        public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int
pos) {
            System.out.println("handleStartTag: " + t );
        }

        @Override
        public void handleEndTag(HTML.Tag t, int pos) {
            System.out.println("handleEndTag: " + t );
        }

        @Override
        public void handleSimpleTag(HTML.Tag t, MutableAttributeSet a, int
pos) {
            System.out.println("handleSimpleTag: " + t );
        }

        @Override
        public void handleError(String errorMsg, int pos) {
            System.out.println("handleError: " + errorMsg );
        }

        @Override
        public void handleEndOfLineString(String eol) {
            System.out.println("handleEndOfLineString: " + eol );
        }

        @Override
        public void handleText(char[] data, int pos) {
            System.out.println("handleText: " + new String(data));
        }
    }
}

Output from the test case:
doTest: / at start is bad
handleEndOfLineString:


doTest: <body>/ at start inside body is okay</body>
handleStartTag: html
handleStartTag: head
handleEndTag: head
handleStartTag: body
handleText: / at start inside body is okay
handleEndTag: body
handleEndTag: html
handleEndOfLineString:

Suggested fix: if a text starts with slash (/) and is not wrapped by tags, the Parser should invoke legalElementContext() to handle this case properly.

10-01-2014