United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-4947038 : Japanese characters not converting correctly from Codepage 930 to Codepage 943

Details
Type:
Bug
Submit Date:
2003-10-31
Status:
Closed
Updated Date:
2004-04-22
Project Name:
JDK
Resolved Date:
2004-02-06
Component:
core-libs
OS:
solaris_8
Sub-Component:
java.nio.charsets
CPU:
sparc
Priority:
P4
Resolution:
Fixed
Affected Versions:
1.3.1_09
Fixed Versions:
1.3.1_12 (12)

Related Reports
Backport:
Backport:

Sub Tasks

Description
Name: dk106046			Date: 10/31/2003

Operating System(s) :
Sun Solaris 2.8

Full JDK version(s) (from java -version) :
java version "1.3.1_09"                                               
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1_09-b03) 
Java HotSpot(TM) Client VM (build 1.3.1_09-b03, mixed mode)           

Detailed description of the problem:
EBCDIC lines of text are being converted to Kanji but it is noticed that some characters do not convert correctly, for example the hyphen character. This problem is not noticed when using Java 1.2.2_17.

- Exact steps to reproduce:

 1 Detach the java files and FTP as Binary to Solaris 
 2 compile with appropriate JDK                               
 3 Run as java CallConverter > jdk131-09.html                 
 4 FTP the jdk131-09.html back to windows as binary                  
 5 Open the jdk131-09.html file in IE5.50 or above should be okay.                                                         
 6 Goto View->Encoding and select Japanese (Shift-JIS) to view the       
 correct charercter set.                                                 
 There would be a circle in the output that is the unwanted character.  This is circled in red in the word doc (picjdk131-04.doc available on request).   The expected output is seen in the html doc (outputFromJDK1.1.8.html available on request).
                
- Source code that demonstrates the problem:

=============== CallConverter.java ==========================================

public class CallConverter {
 
    public static void main(String args []){
        //This is what came back from the mainframe
        String input =   "0E43CE438A43A8404044C445BC45B6459A45864040426045804567455240404586458545530F"; 
        //Hexify the input
        String line = CharacterConverter.getInstance().hexifyString(input);
   //Convert from CodePage930 to CodePage943
        if (line != null && line.length() != 0) {
            System.out.println(CharacterConverter.getInstance().charCodeConvert(line,"Cp930","Cp943"));
        }

    }
}

=============== CharacterConverter.java =======================================

import java.io.UnsupportedEncodingException;
public class CharacterConverter {
   private static String defaultCode = "ISO8859-1";
        private static CharacterConverter instance = new CharacterConverter();

private CharacterConverter()
{
        super();
}
public static CharacterConverter getInstance()
{
        return instance;
}
public String hexifyString(String stringToHexify)
{
        String errMsg = null;
        String tempHex = "";

        // Parse input string to strip out unnecessary 00's and FF's
        boolean shiftout = true;
        int hexIdx = 0;
        int len = stringToHexify.length();

  if ((len % 2) != 0)
        {
               System.out.println("len%2 s");
                return null;
        }

        while (hexIdx < len)
        {
                // Delete 00's and FF's
                if ((stringToHexify.charAt(hexIdx) == '0' && stringToHexify.charAt(hexIdx + 1) == '0') || (stringToHexify.charAt(hexIdx) == 'F' && stringToHexify.charAt(hexIdx + 1) == 'F'))
                {
                        hexIdx += 2;
                }
                else if (!(stringToHexify.charAt(hexIdx) == '0' && stringToHexify.charAt(hexIdx + 1)
 == 'E'))
                {
                        // We have a vaid single-byte pair of characters
                        tempHex += stringToHexify.substring(hexIdx, hexIdx + 2);
                        hexIdx += 2;
                }
                else
                {
                        // we've found a shift-in
                        // copy the "OE"
 tempHex += stringToHexify.substring(hexIdx, hexIdx + 2);
                        hexIdx += 2;

                        // look for 00 and FF every fourth position until we find shift-out
                        shiftout = false;
                        while (!shiftout && hexIdx < len)
                        {
                                if (stringToHexify.charAt(hexIdx) == '0' && stringToHexify.charAt(hexIdx + 1) == 'F')
                                {
                                        shiftout = true;
                                        tempHex += stringToHexify.substring(hexIdx, hexIdx + 2);
                                        hexIdx += 2;
                                }
                                else if ((stringToHexify.charAt(hexIdx) == '0' && stringToHexify.charAt(hexIdx + 1) == '0') || (stringToHexify.charAt(hexIdx) == 'F' && stringToHexify.charAt(hexIdx + 1
) == 'F'))
                                {
                                        // don't copy any four byte sequence beginning with 00's orFF's
                                        hexIdx += 4;
                                }
                                else
                                {
                                        tempHex += stringToHexify.substring(hexIdx, hexIdx + 4);
                                        hexIdx += 4;
                                }
                        }
                }
        }
 String hexedString = tempHex;
        if (hexedString != null && !hexedString.equals(""))
        {
                // hexify the string.

                len = hexedString.length();

          if ((len % 2) != 0)
          {
System.out.println("len%2");
                return null;
        }

                char hexStr[] = new char[len / 2];

                for (int i = 0; i < len; i += 2)
                {
                        hexStr[i / 2] = (char) Integer.parseInt(hexedString.substring(i, i + 2), 16);
                }

                hexedString = new String(hexStr);
        }
        return hexedString;
}
public String charCodeConvert(String hexedString, String defaultEnCode, String fromCode, String toCode)
{
        String convertedString = null;
        try
        {
                String conString = new String(hexedString.getBytes(defaultEnCode), fromCode);
                convertedString = new String(conString.getBytes(toCode));
        }
        catch (UnsupportedEncodingException ue)
        {
//              throw new JavaException("CharacterConverter", ExceptionTypesEnum.ERROR, ue.toString());
                //ue.printStackTrace();
                System.out.println("UnsupportedEncodingException");
        }
        return convertedString;
}
public String charCodeConvert(String hexedString, String fromCode, String toCode)
{
        return charCodeConvert(hexedString, defaultCode, fromCode, toCode );
}

}


We suggested the following fix,:

in ext\i18n\src\share\sun\io\ByteToCharCp930.java file

change line 231 : from :
   "\uFF01\uFFE5\uFF0A\uFF09\uFF1B\uFFE2\uFF0D\uFF0F\uFFFD\uFFFD" + //   400 -   409
to : 
  "\uFF01\uFFE5\uFF0A\uFF09\uFF1B\uFFE2\u2212\uFF0F\uFFFD\uFFFD" + //   400 -   409  

The FFOD character was found to be a problem from the following website. : http://oss.software.ibm.com/pipermail/icu4c-support/2002-October/000757.html

======================================================================

                                    

Comments
EVALUATION

Bug escalated.
###@###.### 2003-11-20

Suggested fix appears to address the issue. Using Markus' comments within the
icu4c support mail thread as a basis for how Cp930 ought to be implmenented
the current J2SE implementation provides the wrong roundtrip mapping for
Cp930 code point 0x4260  (0x4260 (Cp930) --> U+FF0D --> (Cp930) 0x4260).
The fix provided will instate 0x4260 <--> U+2212 as the roundtrip mapping
and if sun.io.CharToByteCp930 is left unchanged then U+FF0D --> 0x4260
will be retained as a fallback mapping which is more in sync with IBM's
current ICU mappings.

The fix should probably also be forward ported into 1.5
###@###.### 2003-11-21
                                     
2003-11-21
CONVERTED DATA

BugTraq+ Release Management Values

COMMIT TO FIX:
1.3.1_12
1.4.2_05
generic
tiger-beta2

FIXED IN:
1.3.1_12
1.4.2_05
tiger-beta2

INTEGRATED IN:
1.3.1_12
1.4.2_05
tiger-b38
tiger-beta2

VERIFIED IN:
1.3.1_12
1.4.2_05


                                     
2004-06-14



Hardware and Software, Engineered to Work Together