JDK-6632959 : swing html parser doesn't know € or ›
  • Type: Bug
  • Component: client-libs
  • Sub-Component: javax.swing
  • Affected Version: 7
  • Priority: P4
  • Status: Closed
  • Resolution: Fixed
  • OS: generic,linux,solaris_10
  • CPU: generic,x86,sparc
  • Submitted: 2007-11-21
  • Updated: 2011-03-07
  • Resolved: 2011-03-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7
7 b97Fixed
Related Reports
Duplicate :  
Description
FULL PRODUCT VERSION :
java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b22)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b08, mixed mode)


ADDITIONAL OS VERSION INFORMATION :
Linux lithium 2.6.22-14-generic #1 SMP Sun Oct 14 21:45:15 GMT 2007 x86_64 GNU/Linux


A DESCRIPTION OF THE PROBLEM :
the HTML of mails from amazon regularly contain › which Swing's DTD doesn't contain. here's a snippet (weird line breaking courtesy of the original, link mangled because i don't know what's encoded in the actual link and it's irrelevant anyway for the purposes of this bug):

<tr align='left'><td valign='top'><strong><font color="#cc6600">&rsaquo;</font></strong>&nbsp;</td>
  <td width='100%'

><font size='2' face='Verdana, Arial, Helvetica, sans-serif'>
 <a href='http://www.sun.com/'>Cannery Row (Steinbeck "Essentials")</a></font>
  </td>
</tr

>

it would be nice to have all of HTML4's character entity references, even if that's as much HTML4 support as we get in Java 7:

http://www.w3.org/TR/html4/sgml/entities.html



STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
stick the attached source in "test.java" and then:

javac test.java && java test

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
something that looks like "> hello". (it's not actually ">". it's '\u203a'. but it looks similar.)
ACTUAL -
something that looks like "&rsaquo; hello".

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import javax.swing.*;
public class test extends JFrame {
 private JTextPane textPane;
 public test() {
  setContentPane(textPane = new JTextPane());
  textPane.setContentType("text/html");
  textPane.setText("<html><head></head><body>&rsquo; hello</body></html>");
  pack();
  setVisible(true);
 }
 public static void main(String[] args) {
  new test();
 }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
code like this, with a line for each HTML4 entity you need:

            DTD html32 = DTD.getDTD("html32");
            html32.defEntity("rsaquo", DTDConstants.CDATA | DTDConstants.GENERAL, '\u203a');
            html32.defEntity("lsaquo", DTDConstants.CDATA | DTDConstants.GENERAL, '\u2039');

the only trick is that you *must* be sure Swing's set up the "html32" DTD first; it doesn't work if you create the "html32" DTD. i'm not sure of the best way to do that.

Comments
EVALUATION Regression test closed/javax/swing/text/html/parser/Parser/ParserTest/ParserTest.java fails because of this bug (see 6849274).
26-05-2010

EVALUATION Swing's DTD should be generated at build time from SGML files found in make/tools/dtdbuilder/dtds. And we even have a rule for that, see make/javax/swing/html32dtd/Makefile. However, before this Makefile is processed, a precompiled DTD is copied from src/share/classes/javax/swing/text/html/parser/html32.bdtd into the build. The generation step is later skipped, because the target already exists at that time. That precompiled DTD is an anachronism. We'd better generate one, since it allows for making changes esaily by editing underlying SGML files. Moreover, the precompiled DTD is incomplete -- it doesn't define several character references such as &euro;.
26-05-2010