JDK-4200439 : Bad generation of HTML files in Swing 1.1
  • Type: Bug
  • Component: client-libs
  • Sub-Component: javax.swing
  • Affected Version: 1.1.7,1.2.0,1.2.2,1.3.0
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS:
    generic,solaris_2.6,windows_95,windows_nt generic,solaris_2.6,windows_95,windows_nt
  • CPU: generic,x86,sparc
  • Submitted: 1999-01-04
  • Updated: 2000-03-09
  • Resolved: 2000-03-09
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
1.4.0 betaFixed
Related Reports
Duplicate :  
Duplicate :  
Description
g src="../images/side_nav/description_up.gif" width="152" vspace="0"
hspace="0" height="17" border="0" alt="Description">
    
  </body>
</html>
(Review ID: 97127)
======================================================================


Name: dbT83986			Date: 01/04/99

=20
It seems there are lots of bugs in Swing 1.1 for HTML writing !

I realized an application doing a search engine using your wonderfull class=
es of javax.swing.text.html package
(available at http://www.eteks.com/) :

First, I created a sub class of HTMLDocument.HTMLReader to search in the te=
xt of HTML files. This is working just fine.

Then to generate an nice HTML ouput file, I used a model HTML file into whi=
ch I inserted the result of the search.
To achieve this, I created a second sub class of HTMLDocument.HTMLReader. T=
his class adds some HTML code thanks to
handleText, handleStartTag and handleEndTag methods of HTMLDocument.HTMLRea=
der class. This part works fine too.

But the generation of an HTML file has many problems :
- No more <BASE ...> tag
- No use of &...; characters
- No more <FORM ...> tag
- <A HREF...><IMG ...></A><A HREF...><IMG ...></A> becomes <A HREF...><IMG =
...><IMG ...></A>
- Tag attribute values are not quoted when they are strings with spaces (AL=
T=3DeTeks site should be ALT=3D"eTeks site")
- Long lines are cut at a given length whenever it's legal to cut a word or=
 not.
- META tag attributes are in TITLE attributes !

Here's a simple HTML file to try all this :
Let's call it exemple.html

<HTML>
<HEAD>
   <BASE HREF=3D"http://www.eteks.com/">
   <META name=3D"copyright" content=3D"(C) 1997-1999 eTeks">
   <TITLE>FindIt ! R&eacute;sultat de la recherche</TITLE>
</HEAD>
<BODY>
<FORM action=3D"http://www.eteks.com/cgi-bin/findit" method=3D"GET">

<P><A HREF=3D"main.html">
   <IMG SRC=3D"coursjava/images/home.gif"
        ALT=3D"eTeks site" WIDTH=3D30 HEIGHT=3D30 BORDER=3D0
        ALIGN=3Dbottom>
   </A><A HREF=3D"mailto:###@###.###">
   <IMG SRC=3D"coursjava/images/e-mail.gif" WIDTH=3D30 HEIGHT=3D30
        BORDER=3D0 ALIGN=3Dbottom></A>
</P>
 =20
<P>R&eacute;sultat de la recherche</P>

<P>Autre recherche : <INPUT TYPE=3D"text" NAME=3D"search" VALUE=3D""
SIZE=3D30><INPUT TYPE=3D"submit" NAME=3D"submit" VALUE=3D"FindIt !">
</P>
</FORM>
</BODY>
</HTML>


A simple Java application creating an output file which should be the same =
as the original (or at least look the same in a browser) :

 /*
 * InputOutput.java  1.0 28/12/98 Emmanuel PUYBARET
 * Copyright (c) 1998 eTeks. All Rights Reserved.
 */

import java.io.*;
import javax.swing.text.*;
import javax.swing.text.html.*;

public class InputOutput extends HTMLDocument
{
  public static void main (String [] args)=20
                 throws IOException, BadLocationException
  {
    Reader inModel =3D new BufferedReader (
                         new FileReader (args [0]));
    Writer outModel =3D new BufferedWriter (
                          new FileWriter (args [1]));
                         =20
    new InputOutput ().readWrite (inModel, outModel);
    inModel.close ();    =20
    outModel.close ();  =20
    System.exit (0); =20
  }

  public void readWrite (Reader in,
                         Writer out)=20
                 throws IOException, BadLocationException
  {
    HTMLEditorKit html =3D new HTMLEditorKit ();
    html.read (in, this, 0);
    new HTMLWriter (out, this).write ();     =20
  }                     =20
}

Just use it this way : java InputOutput exemple.html exemple2.html
Look at exemple2.html and you'll find all these bugs...

I managed to correct most of the problems, but it would be nice if everythi=
ng could work as expected !=20

Thank you anyway for these classes which were difficult to understand at fi=
rst, but finally you provided something easily extendable.

Happy new year
(Review ID: 48947)
======================================================================

Name: skT88420			Date: 08/16/99


I have a small test application which contains an html document.
Also, there is a tabbed pane to switch between html view and 
source view.

Here is the source code of my test app:

package MyEditor;

import javax.swing.*;
import java.awt.*;
import javax.swing.text.html.*;
import javax.swing.text.*;
import javax.swing.event.*;

public class MyEditor extends JPanel {
	BorderLayout layout = new BorderLayout();
	BorderLayout layoutEditor = new BorderLayout();

	JTextPane Editor = new javax.swing.JTextPane();
	JTextPane SourceEditor = new javax.swing.JTextPane();


        HTMLEditorKit editorKit = new HTMLEditorKit();
        StyledEditorKit defaultEditor = new StyledEditorKit();


        JViewport viewport;
        JViewport sourceviewport;
        JScrollPane spSourceEditor = new JScrollPane();
        JScrollPane spEditor = new JScrollPane();

	JTabbedPane tpEditor = new javax.swing.JTabbedPane();

        HTMLDocument document = new HTMLDocument();

	public MyEditor() {
		spEditor.getViewport().setLayout(layoutEditor);
		Editor.setText(" ");
		Editor.setEditorKit(editorKit);

                SourceEditor.setText(" ");
                SourceEditor.setEditorKit(defaultEditor);

		this.setLayout(layout);

		viewport = spEditor.getViewport();
                spEditor.setHorizontalScrollBarPolicy(JScrollPane.HORIZONTAL_SCROLLBAR_ALWAYS );
                viewport.add(Editor);

                tpEditor.addTab("Editor", spEditor);
                tpEditor.addChangeListener(new ChangeListener() {
			public void stateChanged(ChangeEvent e) {
                        	if (tpEditor.getSelectedComponent() == spEditor) {
                                        Editor.setText(SourceEditor.getText());
                                } else {
                                        SourceEditor.setText(Editor.getText());
                                }
                        }
                });
                sourceviewport = spSourceEditor.getViewport();
                sourceviewport.add(SourceEditor);
                tpEditor.addTab("Source", spSourceEditor);

                tpEditor.setTabPlacement(SwingConstants.BOTTOM);

		this.add(tpEditor, BorderLayout.CENTER);

                document.setPreservesUnknownTags(true);
                Editor.setDocument(document);
	}


	public static void main(String[] args) {
		MyEditor editor1 = new MyEditor();
        	JDialog dialog = new JDialog(new Frame());
                dialog.getContentPane().add(editor1);
                dialog.setSize(640,480);
                dialog.setResizable(true);
                dialog.setModal(true);
                dialog.setVisible(true);
                System.exit(0);
	}
}



If the application is launched and switched to source view, it 
shows the following:

<html>
  <body>
    <p resolver=NamedStyle:default {name=default,nrefs=1}>
      
    </p>
  </body>
</html>

Then, i entered between <p resolver.......> and </p> the following 
html code:

<FORM METHOD=POST ACTION="target.html">
Input: <INPUT TYPE=text>
</FORM>

Then, i switched back to the editor view. The text field is 
displayed. But when i switch back to source view, the source looks
like that:

<html>
  <head>
  </head>
  

  <body>
    <p name=default,nrefs=1} resolver=NamedStyle:default>
      Input: <input type=text>
    </p>
  </body>
</html>


The form tag is lost at all.
(Review ID: 93960)
======================================================================

Name: skT88420			Date: 10/27/99


java version "1.2.2"
Classic VM (build JDK-1.2.2-W, native threads, symcjit)

  To reproduce:
1. Compile the file HtmlPage.java. This program will
   a) read the file "in.html",
   b) parse it using HTMLEditorKit into an HTMLDocument
   c) write "out.html" using HTMLEditorKit
   d) look for all <A> and <IMG> tags via HTMLDocument.getIterator()

2. Execute "java HtmlPage"


HtmlPage.java:

import java.net.*;
import java.io.*;

import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;
      
/** Represents an HTML page on some server.
  *
  * Is responsible for:
  *    downloading the contents to a specified directory, resolving
  * relative path differences with a specified root path.
  *    parsing its contents for <A> and <IMG> tags, populating the
  * associated member Vectors
  *    initiating the download of any image files
  *    recursing all anchors if specified*/
public class HtmlPage {

/** It is assumed that any HtmlPage may have relative references within it.
  * It is important to have access to the parent page to help resolve it for
  * downloading purposes.
  */
   private HtmlPage parentPage;
   private java.net.URL url;
   public boolean recurse;
   public boolean confirm;
   
   private java.util.Vector hRef;
   private java.util.Vector imgSrc;
    
   private File remoteFile;
   private File localFile;
   public boolean localFileRelative;


/** This is for bug reporting purposes
 */
   public static void main(String args[]) {

      try {
         HtmlPage x = new HtmlPage(null, "http://xyz.com/in.html");
         x.process();
      }
      catch (Exception e) {
         e.printStackTrace();
      }
      
      System.exit(0);
   }

   public HtmlPage(HtmlPage parent, String urlString, boolean recurse, boolean
relative)
   throws MalformedURLException {

      initHtmlPage(parent, urlString, recurse, relative);
   }
   
   public HtmlPage(HtmlPage parent, String urlString)
   throws MalformedURLException {
   	initHtmlPage(parent, urlString, true, true);
   }

   protected void initHtmlPage(HtmlPage parent, String urlString, boolean
recurse, boolean relative)
   throws MalformedURLException {
   
      parentPage = parent;

      try {
         url = new URL(urlString);
      }
      catch (MalformedURLException e) {
         
         if (parent != null) {
            // Assume the URL to be relative to parent
            File parentFile = new File(parent.getURL().getFile());
            
            urlString = parent.getURL().getProtocol() + parent.getURL().getHost
() +
                  parentFile.getParent() + urlString;
                  
            url = new URL(urlString);
         }
         else {
            // No parent... re-throw the exception
            throw e;
         }
      } // end catch
             
      recurse = recurse;
      localFileRelative = relative;
      confirm = false;

      setRemoteFile();
      setLocalFile();

      hRef = new java.util.Vector();
      imgSrc = new java.util.Vector();
      
   }
   
/** Sets the remote file path/name from the url. This is essentially the
  * url with the protocol and host specifiers.
  */
   protected void setRemoteFile() {
      remoteFile = new File(url.getFile());
   }
   
   protected void setLocalFile() {
      String remotePath = url.getFile();
      
      if (localFileRelative && remotePath.charAt(0) == '/')
         localFile = new File(remotePath.substring(1));
      else
         localFile = new File(remotePath);
                     
   }

/** Processes the URL to the local file.
  */
   public void process(){
      process(confirm);
   }
      
/** Processes the URL to the local file.
  * The confirm argument overrides the confirm property.
  *
  * The process is as follows:
  *    1. Download the contents of the URL to the local file.
  *    2. Parse the local file to find any links (<A> tags) and images (<IMG>
tags)
  *    3. For each of the above found, create an HtmlPage or Image respectively
and add to respective vector.
  *    4. Download each image
  *    5. Recursively process child HtmlPages
  *
  * Confirmation occurs on steps 1, 4, and 5
  */
   public void process(boolean needConfirm){
   
      if (needConfirm && !getConfirm("Download " + url))
         return;
     
 //     download();
      parse();
   }

/*
   protected void download() {
      
      URLReader reader = new URLReader(url);
      
      try {
         reader.copyTo(localFile);
      }
      catch (IOException e) {
         System.err.println("I/O exception occurred attempting to copy " + url
+ " to " + localFile);
         System.err.println(e.getMessage());
         e.printStackTrace();
      }
            
   }
*/
   protected void parse() {
   
      // Create reader for downloaded HTML file.
      BufferedReader in;
      try {
         in = new BufferedReader(new InputStreamReader(new FileInputStream
(localFile)));
      }
      catch (IOException e) {
         System.err.println("I/O Exception while creating BufferedReader for "
+ localFile);
         e.printStackTrace();
         return;
      }
      
      HTMLDocument htmlDoc = new HTMLDocument();
      HTMLEditorKit htmlEdit = new HTMLEditorKit();

      // some debugging stuff
      System.out.println
("getPreservesUnknownTags="+htmlDoc.getPreservesUnknownTags());

      // Parse the file into the htmlDoc
      try {
         htmlEdit.read(in, htmlDoc, 0);

         File file = new File("out.html");
         BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new
FileOutputStream(file)));
         htmlEdit.write(out, htmlDoc, 0, htmlDoc.getLength());
         out.close();

         //htmlDoc.dump(System.out);
         
      }
      catch (Exception e) {
         System.err.println("Exception while parsing " + localFile);
         e.printStackTrace();
         return;
      }

      // Iterate over the <A> and <IMG> tags, placing them into the Vectors
      HTMLDocument.Iterator tags = htmlDoc.getIterator(HTML.Tag.IMG);
      System.out.println("IMG="+HTML.Tag.IMG);
      
      // find out if returned iterator is positioned at the first element
      if (!tags.isValid()) {
         System.out.println("htmlDoc.getIterator().isValid() is false.");
         tags.next();
      }
          
      while (tags.isValid()) {
         System.out.println("Found an <IMG> tag.");
         imgSrc.add(tags.getAttributes());
         tags.next();
      }
      
      tags = htmlDoc.getIterator(HTML.Tag.A);
      
      // find out if returned iterator is positioned at the first element
      if (!tags.isValid()) {
         System.out.println("htmlDoc.getIterator().isValid() is false.");
         tags.next();
      }
          
      while (tags.isValid()) {
         System.out.print(tags.getTag() + " ");
         System.out.println(tags.getAttributes());
         hRef.add(tags.getAttributes());
         tags.next();
      }
      
   }
   
   protected boolean getConfirm(String prompt) {
   
      return true;
   }
   
   public URL getURL() {
      return url;
   }

}


in.html:
<html>
  <head>
  <title>Test</title>
  </head>
  <body>
    <a href="../index.html"><img src="../images/side_nav/home.gif" height="17"
border="0" width="144" vspace="0" alt="Home" hspace="0"></a>
    <IMG SRC="../images/side_nav/description_up.gif" WIDTH=152 HEIGHT=17
BORDER=0 ALT="Description" HSPACE=0 VSPACE=0>
  </body>
</html>


Execution:
C:\MyDocs\java\Src\Bug>java HtmlPage
getPreservesUnknownTags=true
IMG=img
htmlDoc.getIterator().isValid() is false.
a href=../index.html


out.html:
<html>
  <head>
  <title>Test  </title>
  </head>
  

  <body>
    <a href="../index.html"><img src="../images/side_nav/home.gif" width="144"
vspace="0" hspace="0" height="17" border="0" alt="Home">
    </a><im

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: merlin-beta FIXED IN: merlin-beta INTEGRATED IN: merlin-beta
14-06-2004

WORK AROUND Name: dbT83986 Date: 01/04/99 =20 Here's the base of my final program : /* * BetterInputOutput.java 1.0 28/12/98 Emmanuel PUYBARET * Copyright (c) 1998 eTeks. All Rights Reserved. */ import java.io.*; import java.util.*; import javax.swing.text.*; import javax.swing.text.html.*; import javax.swing.text.html.parser.*; public class BetterInputOutput extends HTMLDocument { public static void main (String [] args)=20 throws IOException, BadLocationException { Reader inModel =3D new BufferedReader ( new FileReader (args [0])); Writer outModel =3D new BufferedWriter ( new FileWriter (args [1])); =20 new BetterInputOutput ().readWrite (inModel, outModel); inModel.close (); =20 outModel.close (); =20 System.exit (0); =20 } public void readWrite (Reader in, Writer out) throws IOException, =20 BadLocationException { HTMLEditorKit html =3D new HTMLEditorKit (); html.read (in, this, 0); new HTMLWriterWithLongLines (out, this).write (); =20 } =20 public HTMLEditorKit.ParserCallback getReader (int pos)=20 { return new BetterInputOutputReader (); } =20 class BetterInputOutputReader extends HTMLDocument.HTMLReader { public BetterInputOutputReader () { super (0); registerTag (HTML.Tag.BASE, new HiddenAction ()); registerTag (HTML.Tag.FORM, new Form2Action ()); } =20 public void handleText (char [] data, int pos) { super.handleText (toHTMLCharacter (data), pos); } =20 public class Form2Action extends CharacterAction { public void start (HTML.Tag t, MutableAttributeSet attr) { =20 blockOpen (t, attr); super.start (t, attr); } public void end (HTML.Tag t)=20 { super.end (t); blockClose (t); } } protected void blockOpen (HTML.Tag t,=20 MutableAttributeSet attr)=20 { super.blockOpen (t, attr); } protected void blockClose (HTML.Tag t) { super.blockClose (t); } } =20 private String toHTMLCharacter (String text) { return new String (toHTMLCharacter (text.toCharArray ())); } =20 private char [] toHTMLCharacter (char [] text) { try { // Correction sur l'erreur d'=E9criture dans Swing 1.1 StringBuffer result =3D new StringBuffer (text.length * 4/3); DTD dtd =3D DTD.getDTD ("html32"); Entity e; char c; =20 for (int i =3D 0; i < text.length; i++) if ((e =3D dtd.getEntity (c =3D text [i])) !=3D null) { result.append ('&'); result.append (e.getName ()); result.append (';'); } =20 else result.append (c); return result.toString ().toCharArray (); } catch (IOException e) { return text; } } =20 class HTMLWriterWithLongLines extends HTMLWriter { public HTMLWriterWithLongLines (Writer w, HTMLDocument doc)=20 { =20 super (w, doc); setLineLength (1000); } =20 protected void writeAttributes(AttributeSet attr)=20 throws IOException=20 { Enumeration names =3D attr.getAttributeNames(); while (names.hasMoreElements())=20 { Object name =3D names.nextElement(); =09if ( name instanceof HTML.Tag || name =3D=3D StyleConstants.NameAttribute || name =3D=3D HTML.Attribute.ENDTAG || name =3D=3D StyleConstants.ModelAttribute) continue; String att =3D attr.getAttribute(name).toString (); if ( att.length () =3D=3D 0=20 || att.indexOf (' ') >=3D 0) write(" " + name + "=3D\"" + toHTMLCharacter (att) + "\""); else write(" " + name + "=3D" + toHTMLCharacter (att)); } } } } ====================================================================== Name: skT88420 Date: 08/16/99 none. (Review ID: 93960) ====================================================================== Name: skT88420 Date: 10/27/99 None. (Review ID: 97127) ======================================================================
11-06-2004

EVALUATION The only thing that needs some work here is writing out of the form tags, everything else has been fixed for 1.1.1 (1.2.1) scott.violet 1999-03-23 Fixing writing out of forms is currently problematic due to the way we model forms. Something like: <form> <table> and <table><form> get modeled the same. scott.violet@eng 1999-09-16 The html support provided in the J2SE has never supported writing out of form elements correctly. This was mainly due to how we internally modeled them. To better match the DOM we have changed how forms are internally modeled. Previously any attributes of a form would be stored in the attributeset of all the children character elements. With merlin an element will be created to represent the form, better matching that of the html file itself. This allows for better modeling of the form, as well as consistant writing of the form. This will effect developers that relied on forms to be lously handled. As an example, we would previously treat the following incorrect html: <table> <form> </table> </form> as: <form> <table> </table> </form> With Merlin we will instead treat it as: <table> <form> </form> </table> If developers wish to better support invalid html, they can create their own parser implementation. scott.violet@eng 2000-03-06
06-03-2000