Bug ID: JDK-4947038 Japanese characters not converting correctly from Codepage 930 to Codepage 943

Type: Bug
Component: core-libs
Sub-Component: java.nio.charsets
Affected Version: 1.3.1_09

Priority: P4
Status: Closed
Resolution: Fixed
OS: solaris_8
CPU: sparc

Submitted: 2003-10-31
Updated: 2004-04-22
Resolved: 2004-02-06

Other	Other
1.3.1_12 12Fixed	1.4.2_05Fixed

Name: dk106046			Date: 10/31/2003

Operating System(s) :
Sun Solaris 2.8

Full JDK version(s) (from java -version) :
java version "1.3.1_09"                                               
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1_09-b03) 
Java HotSpot(TM) Client VM (build 1.3.1_09-b03, mixed mode)           

Detailed description of the problem:
EBCDIC lines of text are being converted to Kanji but it is noticed that some characters do not convert correctly, for example the hyphen character. This problem is not noticed when using Java 1.2.2_17.

- Exact steps to reproduce:

 1 Detach the java files and FTP as Binary to Solaris 
 2 compile with appropriate JDK                               
 3 Run as java CallConverter > jdk131-09.html                 
 4 FTP the jdk131-09.html back to windows as binary                  
 5 Open the jdk131-09.html file in IE5.50 or above should be okay.                                                         
 6 Goto View->Encoding and select Japanese (Shift-JIS) to view the       
 correct charercter set.                                                 
 There would be a circle in the output that is the unwanted character.  This is circled in red in the word doc (picjdk131-04.doc available on request).   The expected output is seen in the html doc (outputFromJDK1.1.8.html available on request).
                
- Source code that demonstrates the problem:

=============== CallConverter.java ==========================================

public class CallConverter {
 
    public static void main(String args []){
        //This is what came back from the mainframe
        String input =   "0E43CE438A43A8404044C445BC45B6459A45864040426045804567455240404586458545530F"; 
        //Hexify the input
        String line = CharacterConverter.getInstance().hexifyString(input);
   //Convert from CodePage930 to CodePage943
        if (line != null && line.length() != 0) {
            System.out.println(CharacterConverter.getInstance().charCodeConvert(line,"Cp930","Cp943"));
        }

    }
}

=============== CharacterConverter.java =======================================

import java.io.UnsupportedEncodingException;
public class CharacterConverter {
   private static String defaultCode = "ISO8859-1";
        private static CharacterConverter instance = new CharacterConverter();

private CharacterConverter()
{
        super();
}
public static CharacterConverter getInstance()
{
        return instance;
}
public String hexifyString(String stringToHexify)
{
        String errMsg = null;
        String tempHex = "";

        // Parse input string to strip out unnecessary 00's and FF's
        boolean shiftout = true;
        int hexIdx = 0;
        int len = stringToHexify.length();

  if ((len % 2) != 0)
        {
               System.out.println("len%2 s");
                return null;
        }

        while (hexIdx < len)
        {
                // Delete 00's and FF's
                if ((stringToHexify.charAt(hexIdx) == '0' && stringToHexify.charAt(hexIdx + 1) == '0') || (stringToHexify.charAt(hexIdx) == 'F' && stringToHexify.charAt(hexIdx + 1) == 'F'))
                {
                        hexIdx += 2;
                }
                else if (!(stringToHexify.charAt(hexIdx) == '0' && stringToHexify.charAt(hexIdx + 1)
 == 'E'))
                {
                        // We have a vaid single-byte pair of characters
                        tempHex += stringToHexify.substring(hexIdx, hexIdx + 2);
                        hexIdx += 2;
                }
                else
                {
                        // we've found a shift-in
                        // copy the "OE"
 tempHex += stringToHexify.substring(hexIdx, hexIdx + 2);
                        hexIdx += 2;

                        // look for 00 and FF every fourth position until we find shift-out
                        shiftout = false;
                        while (!shiftout && hexIdx < len)
                        {
                                if (stringToHexify.charAt(hexIdx) == '0' && stringToHexify.charAt(hexIdx + 1) == 'F')
                                {
                                        shiftout = true;
                                        tempHex += stringToHexify.substring(hexIdx, hexIdx + 2);
                                        hexIdx += 2;
                                }
                                else if ((stringToHexify.charAt(hexIdx) == '0' && stringToHexify.charAt(hexIdx + 1) == '0') || (stringToHexify.charAt(hexIdx) == 'F' && stringToHexify.charAt(hexIdx + 1
) == 'F'))
                                {
                                        // don't copy any four byte sequence beginning with 00's orFF's
                                        hexIdx += 4;
                                }
                                else
                                {
                                        tempHex += stringToHexify.substring(hexIdx, hexIdx + 4);
                                        hexIdx += 4;
                                }
                        }
                }
        }
 String hexedString = tempHex;
        if (hexedString != null && !hexedString.equals(""))
        {
                // hexify the string.

                len = hexedString.length();

          if ((len % 2) != 0)
          {
System.out.println("len%2");
                return null;
        }

                char hexStr[] = new char[len / 2];

                for (int i = 0; i < len; i += 2)
                {
                        hexStr[i / 2] = (char) Integer.parseInt(hexedString.substring(i, i + 2), 16);
                }

                hexedString = new String(hexStr);
        }
        return hexedString;
}
public String charCodeConvert(String hexedString, String defaultEnCode, String fromCode, String toCode)
{
        String convertedString = null;
        try
        {
                String conString = new String(hexedString.getBytes(defaultEnCode), fromCode);
                convertedString = new String(conString.getBytes(toCode));
        }
        catch (UnsupportedEncodingException ue)
        {
//              throw new JavaException("CharacterConverter", ExceptionTypesEnum.ERROR, ue.toString());
                //ue.printStackTrace();
                System.out.println("UnsupportedEncodingException");
        }
        return convertedString;
}
public String charCodeConvert(String hexedString, String fromCode, String toCode)
{
        return charCodeConvert(hexedString, defaultCode, fromCode, toCode );
}

}


We suggested the following fix,:

in ext\i18n\src\share\sun\io\ByteToCharCp930.java file

change line 231 : from :
   "\uFF01\uFFE5\uFF0A\uFF09\uFF1B\uFFE2\uFF0D\uFF0F\uFFFD\uFFFD" + //   400 -   409
to : 
  "\uFF01\uFFE5\uFF0A\uFF09\uFF1B\uFFE2\u2212\uFF0F\uFFFD\uFFFD" + //   400 -   409  

The FFOD character was found to be a problem from the following website. : http://oss.software.ibm.com/pipermail/icu4c-support/2002-October/000757.html

======================================================================

CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: 1.3.1_12 1.4.2_05 generic tiger-beta2 FIXED IN: 1.3.1_12 1.4.2_05 tiger-beta2 INTEGRATED IN: 1.3.1_12 1.4.2_05 tiger-b38 tiger-beta2 VERIFIED IN: 1.3.1_12 1.4.2_05

14-06-2004

EVALUATION Bug escalated. ###@###.### 2003-11-20 Suggested fix appears to address the issue. Using Markus' comments within the icu4c support mail thread as a basis for how Cp930 ought to be implmenented the current J2SE implementation provides the wrong roundtrip mapping for Cp930 code point 0x4260 (0x4260 (Cp930) --> U+FF0D --> (Cp930) 0x4260). The fix provided will instate 0x4260 <--> U+2212 as the roundtrip mapping and if sun.io.CharToByteCp930 is left unchanged then U+FF0D --> 0x4260 will be retained as a fallback mapping which is more in sync with IBM's current ICU mappings. The fix should probably also be forward ported into 1.5 ###@###.### 2003-11-21

21-11-2003