Name: bb33257 Date: 08/14/98
As far as character set conversions go, the Java API is
deficient, and seriously compromises its functionality.
No Converter Names. You can't get a human-readable alias for converters--so if you have no idea
what Cp964 is, for example, you are stuck. For an example of how
this is used, see Netscape's Encoding menu.
Moreover, many of the names are very misleading, such as the ones listed in the table below. If you
are converting a series of strings to be later concatenated, for example, you only want a signature on
the very first of them (if at all)--so you need to know to start with UnicodeBig, but continue with
UnicodeBigUnmarked (or UnicodeLittle and UnicodeLittleUnmarked, resp.).
Java Name Description
Unicode, UnicodeBig Unicode (UTF-16BE), prefaced by FEFF signature
UnicodeLittle Little Endian Unicode (UTF-16LE), prefaced by FFFE signature
UnicodeBigUnmarked Unicode (UTF-16BE)
UnicodeLittleUnmarked Little Endian Unicode (UTF-16LE)
JIS JIS with ISO 2022 announcers
Illegal Codes. There is no way to control what happens with illegal byte sequences. They are
usually just skipped, with no warning.
======================================================================