Bug ID: JDK-6173388 JavaMail 1.3.2 - Email message with japanese characters shown as garbage

Type: Bug
Component: core-libs
Sub-Component: java.nio.charsets
Affected Version: tz1.3.2,1.4.2,1.4.2_05,6

Priority: P4
Status: Resolved
Resolution: Fixed
OS: other,linux,solaris_8
CPU: x86,sparc

Submitted: 2004-09-30
Updated: 2010-08-06
Resolved: 2005-09-30

JDK 6
6 b55Fixed

Customer Problem Description:

I am already using the 1.3.2. Attached is a testcase

1. Send an email with a pop3 mail server with special characters from "out" file in the UTF-8 session but use default mine encoding. 
2. run following command:
java -classpath ".;activation.jar;mail.jar" EmailJapanese emailServerName emailUserAccount emailUserPassword >out 2>&1
###@###.### 9/30/04 19:07 GMT

EVALUATION As the solution, the "standard based" ISO-2022-JP implementation remains unchanged, which means it will supports ASCII, JIS_X_0201 and JIS_X_0208 mapping based encoding/decoding only. Three new Microsoft iso2022-jp variants, namely x-windows-50220, x-windows-50221 and x-windows-iso2022jp, are added into JDK/JRE's charset repository to support those non-JIS0208 based characters. These three Microsoft variants behave "slight" differently compared to the "standard based" implementation. See below for implementation details. Users who prefer the behavior of MS iso-2022-jp variants are recommended to use these names explicitly instead of "ISO-2022-JP" (and its aliases). However for those who must have the "ISO-2022-JP" (and its aliases) charset behave the way the MS variants do (to avoid changing the existing source code which has been using "ISO-2022-JP" already), system property "sun.nio.cs.map" might be defined at command line when jvm starts to switch the ISO-2022-JP charset to use one of Microsoft's variants. For example java -Dsun.nio.cs.map=x-windows-iso2022jp/ISO-2022-JP <YOURCLASSNAME> Implementation notes: 1)MS50220 and MS50221 are assumed to work the same way as Microsoft CP50220 and CP50221's 7-bit implementation works by using CP5022X specific JIS0208 and JIS0212 mapping tables (generated via Microsoft's MultiByteToWideChar/WideCharToMultiByte APIs). The only difference between these 2 classes is that MS50220 does not support singlebyte halfwidth kana (Uff61-Uff9f) shiftin mechanism when "encoding", instead these halfwidth kana characters are converted to their fullwidth JIS0208 counterparts. The difference between the standard JIS_X_0208 and JIS_X_0212 mappings and the CP50220/50221 specific are 0208 mapping: 1)0x213d <-> U2015 (compared to U2014) 2)One way mappings for 5 characters below u2225 (ms) -> 0x2142 <-> u2016 (jis) uff0d (ms) -> 0x215d <-> u2212 (jis) uffe0 (ms) -> 0x2171 <-> u00a2 (jis) uffe1 (ms) -> 0x2172 <-> u00a3 (jis) uffe2 (ms) -> 0x224c <-> u00ac (jis) //should consider 0xff5e -> 0x2141 <-> U301c? 3)NEC Row13 0x2d21-0x2d79 4)85-94 ku <-> UE000,UE3AB (includes NEC selected IBM kanji in 89-92ku) 5)UFF61-UFF9f -> Fullwidth 0208 KANA 0212 mapping: 1)0x2237 <-> UFF5E (Fullwidth Tilde) 2)0x2271 <-> U2116 (Numero Sign) 3)85-94 ku <-> UE3AC - UE757 (2)MSISO2022JP uses a JIS0208 mapping generated from MS932DB.b2c and MS932DB.c2b by converting the SJIS codepoints back to their JIS0208 counterparts. With the exception of (a)Codepoints with a resulting JIS0208 codepoints beyond 0x7e00 are dropped (this includs the IBM Extended Kanji/Non-kanji from 0x9321 to 0x972c) (b)The Unicode codepoints that the IBM Extended Kanji/Non-kanji are mapped to (in MS932) are mapped back to NEC selected IBM Kanji/ Non-kanji area at 0x7921-0x7c7e. Compared to JIS_X_0208 mapping, this MS932 based mapping has (a)different mappings for 7 JIS codepoints 0x213d <-> U2015 0x2141 <-> UFF5E 0x2142 <-> U2225 0x215d <-> Uff0d 0x2171 <-> Uffe0 0x2172 <-> Uffe1 0x224c <-> Uffe2 (b)added one-way c2b mappings for U00b8 -> 0x2124 U00b7 -> 0x2126 U00af -> 0x2131 U00ab -> 0x2263 U00bb -> 0x2264 U3094 -> 0x2574 U00b5 -> 0x264c (c)NEC Row 13 (d)NEC selected IBM extended Kanji/Non-kanji These codepoints are mapped to the same Unicode codepoints as the MS932 does, while MS50220/50221 maps them to the Unicode private area. # There is also an interesting difference when compared to MS5022X 0208 mapping for JIS codepoint "0x2D60", MS932 maps it to U301d but MS5022X maps it to U301e, obvious MS5022X is wrong, but...

16-09-2005

EVALUATION in tmp.jis, the only unmappable "jis" character is "0x2d6a", which should be a user defined codepoint in ms932. ###@###.### 2005-05-03 07:21:27 GMT The root cause of the problem is that the Windows platforms use Code Page 932 which is incompatible with the JIS standard (JIS X 0208:1997) in two ways. 1) It added more characters to the reserved space of JIS X 0208. 2) Windows uses their own JIS X 0208 to Unicode mappings for several characters. Therefore, any JIS standard compliant code converters encounter the incompatibility problems. There is no way to convert Unicode text converted from Windows Code Page 932 using the MS932 converter to an ISO-2022-JP stream. However, some customers see it's a bug if anything is inconsistent with what Windows does and some see it's a standard violation (bug) if the Windows mappings are used in JIS X 0208 converters. See also 4426415. ###@###.### 10/12/04 02:09 GMT Copied Masayoshi's comment to "Evaluation". Will consider to address this issue in next major release should more requests come in. "Downgrade" the priority. ###@###.### 2005-07-21 05:04:28 GMT

30-09-2004

WORK AROUND Use the "Windows-31J" charset to transfer Unicode text converted from Windows Code Page 932. ###@###.### 10/12/04 02:09 GMT

30-09-2004

Duplicate :	JDK-4976235 - iso-2022-jp: provide support for NEC-Row-13 characters
Duplicate :	JDK-6679005 - HORIZONTAL BAR (U+2015) conversion on ISO-2022-JP is wrong.
Relates :	JDK-4191177 - Unicode/JIS Code conversion problems on Windows NT
Relates :	JDK-6310716 - decodeText() doesn't convert from iso-2022-jp to Unicode for some Japanese chars
Relates :	JDK-4426415 - (cs) Charset naming scheme is insufficient