JDK-4290248 : HTMLEditorKit ignores HTML
  • Type: Bug
  • Component: client-libs
  • Sub-Component: javax.swing
  • Affected Version: 1.3.0
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • OS: generic
  • CPU: generic
  • Submitted: 1999-11-11
  • Updated: 2000-01-13
  • Resolved: 2000-01-13
Related Reports
Duplicate :  
Description

Name: skT88420			Date: 11/10/99


java version "1.3beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3beta-O)
Java(TM) HotSpot Client VM (build 1.3beta-O, mixed mode)

When HTML input is parsed by the HTMLEditorKit, <FORM amd </FORM> tags are
ignored.
I discovered the bug when I wanted to use an iterator
HTMLDocument.Iterator formIterator=doc.getIterator(HTML.Tag.INPUT)
That iterator never returned a tag.
The test example demonstrates the bug by simply writing the document to a file.
However, it does not directly demonstrate the iterator problem, for which
I have not opened a separate bug.
The behaviour is the same in JDK1.2.2


Input HTML file input.htm:
<html>
  <head>
      <title>Test</title>
  </head>
  <body>
    <form action="http://www.javasoft.com" method=get>
    <input type="hidden" name="test" value="1">
    </form>
  </body>
</html>

Output HTML file output.htm:
<html>
  <head>
    <title>Test    </title>
    
  </head>
  <body>
    <input value="1" type="hidden" name="test">
    

    <p>
      
    </p>
  </body>
</html>
Java source file:

import java.io.*;
import java.net.*;
import javax.swing.text.*;
import javax.swing.text.html.*;
 
class HtmlDoc {

    static FileWriter writer;
    static FileReader reader;
    static String inputFile = "input.htm";
    static String resultFile = "output.htm";
    public static void main(String[] args) {


        HTMLEditorKit kit = new HTMLEditorKit();
        HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();
        doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
       
        try {
            // Create a reader on the HTML content.
            reader = new FileReader(inputFile);
            writer = new FileWriter(new File(resultFile));
            kit.read(reader, doc, 0);
            kit.write(writer, doc, 0, 10000);
            writer.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
        System.exit(1);

    }//End main()
 
}//End class HtmlDoc
(Review ID: 97558) 
======================================================================

Name: skT88420			Date: 11/17/99


java version "1.3beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3beta-O)
Java(TM) HotSpot Client VM (build 1.3beta-O, mixed mode)


See below code:
/*
HTMLDocument.Iterator ignores tags e.g. <INPUT. I think <IMG is not working,
either.
It would be handy to have an easy way to iterate through the input tags
of e.g. the 2nd form on a page only. Similar to JavaScript
document.forms[1].elements[index] and document.forms[1].elements.length. I came
across this bug when I was looking for such functionality.
*/
import java.io.*;
import java.net.*;
import javax.swing.text.*;
import javax.swing.text.html.*;
 
class HtmlTest {

    public static void main(String[] args) {


        HTMLEditorKit kit = new HTMLEditorKit();
        HTMLDocument doc =
(javax.swing.text.html.HTMLDocument)kit.createDefaultDocument();
        // The Document class does not yet handle charset's properly.
        doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);

        try {
            // Create a reader on the HTML content.
            Reader rd = getReader(args[0]);
            // Parse the HTML.
            kit.read(rd, doc, 0);
            // Iterate through specific HTML tags of the HTML document.
            HTMLDocument.Iterator tagIterator=doc.getIterator(HTML.Tag.INPUT);
            int tagCount=0;
            while (tagIterator.isValid()){
                tagCount++;
                tagIterator.next();
            }
            System.out.println("tagCount="+tagCount);
        } catch (Exception e) {
            e.printStackTrace();
        }

        System.out.println("Please press any key to exit");
        try {System.in.read();}catch(IOException e){}
        System.exit(1);


    }//End main()
 
    // Returns a reader on the HTML data. If 'uri' begins
    // with "http:", it's treated as a URL; otherwise,
    // it's assumed to be a local filename.
    static Reader getReader(String uri) throws IOException {
        if (uri.startsWith("http:")) {
            // Retrieve from Internet.
            URLConnection conn = new URL(uri).openConnection();
            return new InputStreamReader(conn.getInputStream());
        } else {
            // Retrieve from file.
            return new FileReader(uri);
        }
    }

}//End class HtmlTest


input file input.htm:
<html>
  <head>
      <title>Test</title>
  </head>
  <body>
    <form action="http://www.javasoft.com" method=get>
    <input type="hidden" name="test" value="1">
    </form>
  </body>
</html>

execute: java HtmlTest input.htm
(Review ID: 97821)
======================================================================

Comments
WORK AROUND Name: skT88420 Date: 11/10/99 None ====================================================================== Name: skT88420 Date: 11/17/99 None. I am still looking for that easy solution :) (Review ID: 97821) ======================================================================
11-06-2004

EVALUATION FORM tags are not ignored, they are modeled slightly differently than most tags in that they do not get their own Element. Instead of a new Element for FORM tags, the value is put in the AttributeSet of the children leaf Elements of the FORM tag. If you modify the test program to look like: HTMLDocument.Iterator tagIterator=doc.getIterator(HTML.Tag.INPUT); ElementIterator ei = new ElementIterator(doc); Element element; while ((element = ei.next()) != null) { System.out.println("name: " + element.getName()); AttributeSet attrs = element.getAttributes(); Enumeration attrNames = attrs.getAttributeNames(); while (attrNames.hasMoreElements()) { Object key = attrNames.nextElement(); System.out.println("\t" + key + ": " + attrs.getAttribute(key)); } } You get the following output: name: html name: html name: head name: head name: p-implied name: p-implied name: title name: title name: title name: title endtag: true name: content name: content name: body name: body name: p-implied name: p-implied name: input name: test value: 1 type: hidden form: method=get action=http://www.javasoft.com name: input name: content name: content name: p name: p name: content name: content As you can see, the form attribute is attached to the Element representing INPUT. We realize this is rather different than you expect the FORM to be modeled and are currently investigating this. As to why we don't properly write out the <form> when outputing, refer to bug 4200439, which I am closing this as a dup of. scott.violet@eng 2000-01-13
13-01-2000