Bug ID: JDK-4753019 stddoclet: "No character encoding detected" warning from HTML validator

Type: Bug
Component: tools
Sub-Component: javadoc(tool)
Affected Version: 1.4.0

Priority: P4
Status: Resolved
Resolution: Duplicate
OS: other
CPU: generic

Submitted: 2002-09-25
Updated: 2014-05-05
Resolved: 2014-02-13

When running W3C valicator http://validator.w3.org/
on 1.4.x, get the following warning:

  Warning: No Character Encoding detected! To assure correct validation,  
  processing, and display, it is important that the character encoding is 
  properly labeled.

  The document character set for XML and HTML 4.0 is Unicode (aka ISO 10646). 
  This means that HTML browsers and XML processors should behave as if they 
  used Unicode internally. But it doesn't mean that documents have to 
  be transmitted in Unicode. As long as client and server agree on the 
  encoding, they can use any encoding that can be converted to Unicode.

  It is very important that the character encoding of any XML or (X)HTML 
  document is clearly labeled . This can be done in the following ways:

   - For HTML, use the <meta> tag. Example: 
     <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 

   With this information, clients can easily map these encodings to Unicode. 
   In practice, a few encodings will be preferred, most likely: ISO-8859-1
   (Latin-1), US-ASCII , UTF-8 , UTF-16 , the other encodings in the
   ISO-8859 series, iso-2022-jp , euc-kr , and so on.

   Source:   http://www.w3.org/International/O-charset.html

However, the I18N team has previously warned against using charsets in meta
tags, so this discrepancy needs to be resolved.

This is a duplicate of 4756688

13-02-2014

CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: dragon

28-08-2004

EVALUATION Name: nl37777 Date: 09/30/2002 Pages should always identify their character encoding (in this context called "charset"), and, if you follow the specifications, MUST identify their character encodings if they're encoded in anything other than ISO 8859-1. In reality, you can get away with not specifying the character encoding (the user then has to select the right one), but things break if you specify the wrong character encoding (browsers won't let the user select the right one in this case). So, the right thing to do is to add a tag <META http-equiv="Content-Type" content="text/html; charset=..."> with the appropriate charset name to generated pages. The names should be the preferred MIME names as given in the IANA registry, http://www.iana.org/assignments/character-sets. Note that javadoc's -docencoding flag (like the J2RE) accepts a large variety of names that aren't preferred MIME or even valid IANA names. The java.nio.charset.Charset API may help here, at least for character encodings that are currently supported by the java.nio APIs: Charset.forName(encodingName).name returns the canonical name for an encoding, and the canonical name is the preferred MIME for encodings where there is such a name. Charset.isRegistered can be used to verify whether the name is valid. ###@###.### See related RFE: 4756688: Combine -docencoding and -charset options ###@###.### 2003-09-23 Name: nl37777 Date: 12/24/2003 I recommend fixing this by first fixing 4756688, then setting "-docencoding iso-8859-1" for English pages and "-docencoding euc-jp" for Japanese pages. If 4756688 can't be fixed soon enough, than setting both -docencoding and -charset with these values also works. ======================================================================

28-08-2004