United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-6632959 : swing html parser doesn't know € or ›

Details
Type:
Bug
Submit Date:
2007-11-21
Status:
Closed
Updated Date:
2011-03-07
Project Name:
JDK
Resolved Date:
2011-03-07
Component:
client-libs
OS:
linux,generic,solaris_10
Sub-Component:
javax.swing
CPU:
x86,sparc,generic
Priority:
P4
Resolution:
Fixed
Affected Versions:
7
Fixed Versions:

Related Reports
Duplicate:
Duplicate:
Duplicate:

Sub Tasks

Description
FULL PRODUCT VERSION :
java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b22)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b08, mixed mode)


ADDITIONAL OS VERSION INFORMATION :
Linux lithium 2.6.22-14-generic #1 SMP Sun Oct 14 21:45:15 GMT 2007 x86_64 GNU/Linux


A DESCRIPTION OF THE PROBLEM :
the HTML of mails from amazon regularly contain › which Swing's DTD doesn't contain. here's a snippet (weird line breaking courtesy of the original, link mangled because i don't know what's encoded in the actual link and it's irrelevant anyway for the purposes of this bug):

<tr align='left'><td valign='top'><strong><font color="#cc6600">&rsaquo;</font></strong>&nbsp;</td>
  <td width='100%'

><font size='2' face='Verdana, Arial, Helvetica, sans-serif'>
 <a href='http://www.sun.com/'>Cannery Row (Steinbeck "Essentials")</a></font>
  </td>
</tr

>

it would be nice to have all of HTML4's character entity references, even if that's as much HTML4 support as we get in Java 7:

http://www.w3.org/TR/html4/sgml/entities.html



STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
stick the attached source in "test.java" and then:

javac test.java && java test

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
something that looks like "> hello". (it's not actually ">". it's '\u203a'. but it looks similar.)
ACTUAL -
something that looks like "&rsaquo; hello".

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import javax.swing.*;
public class test extends JFrame {
 private JTextPane textPane;
 public test() {
  setContentPane(textPane = new JTextPane());
  textPane.setContentType("text/html");
  textPane.setText("<html><head></head><body>&rsquo; hello</body></html>");
  pack();
  setVisible(true);
 }
 public static void main(String[] args) {
  new test();
 }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
code like this, with a line for each HTML4 entity you need:

            DTD html32 = DTD.getDTD("html32");
            html32.defEntity("rsaquo", DTDConstants.CDATA | DTDConstants.GENERAL, '\u203a');
            html32.defEntity("lsaquo", DTDConstants.CDATA | DTDConstants.GENERAL, '\u2039');

the only trick is that you *must* be sure Swing's set up the "html32" DTD first; it doesn't work if you create the "html32" DTD. i'm not sure of the best way to do that.

                                    

Comments
EVALUATION

Swing's DTD should be generated at build time from SGML files found in make/tools/dtdbuilder/dtds. And we even have a rule for that, see make/javax/swing/html32dtd/Makefile. However, before this Makefile is processed, a precompiled DTD is copied from src/share/classes/javax/swing/text/html/parser/html32.bdtd into the build. The generation step is later skipped, because the target already exists at that time.

That precompiled DTD is an anachronism. We'd better generate one, since it allows for making changes esaily by editing underlying SGML files. Moreover, the precompiled DTD is incomplete -- it doesn't define several character references such as &euro;.
                                     
2010-05-26
EVALUATION

Regression test
closed/javax/swing/text/html/parser/Parser/ParserTest/ParserTest.java
fails because of this bug (see 6849274).
                                     
2010-05-26



Hardware and Software, Engineered to Work Together