JDK-4426470 : String.getBytes() method does not convert some Big5 characters correctly
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Affected Version: 1.1.7,1.4.0
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • OS: solaris_2.6,solaris_7
  • CPU: generic,sparc
  • Submitted: 2001-03-16
  • Updated: 2006-07-25
  • Resolved: 2006-07-25
Related Reports
Duplicate :  
Duplicate :  
Description
I have attached the Java program that will reproduce the bug. The program tries to convert the unicode byte array in Big5 encoding to a String and then get the bytes back by doing String.getBytes("Big5").

The bytes returned form the String.getBytes() method should return the
original bytes. The program tries to convert the following Big5 characters
f9d4
f9d5
f9d6
f9d7
f9d8
f9dd
f9de

but only the first two are converted back correctly (i.e f9d4 and f9d5) and
the others fail to convert correctly.

The bug is failing for customer using JDK 1.3
However, I can reproduce the problem using both JDK's:
ladybird JDK 1.3.1-rc1-b19 and merlin JDK 1.4.0-beta-b56.

Here's the test case:

public class f2 {
    
    public static String hexStr(byte b) {
        int i= 0xff & b;
        return "0x" + Integer.toString(i, 16);
    }

    public static void main(String arg[]) {

    try {
        byte inbuf[][] =  {
            { (byte)0xf9, (byte)0xd4 },
            { (byte)0xf9, (byte)0xd5 },
            { (byte)0xf9, (byte)0xd6 },
            { (byte)0xf9, (byte)0xd7 },
            { (byte)0xf9, (byte)0xd8 },
            { (byte)0xf9, (byte)0xdd },
            { (byte)0xf9, (byte)0xde },
        };

        System.out.println("platform encoding = " +
                            System.getProperty("file.encoding"));

        for (int i=0;i<inbuf.length;i++) {
            System.out.println("Original bytes :" + 
                               hexStr(inbuf[i][0])+" "+hexStr(inbuf[i][1]));

            String s = new String(inbuf[i], "BIG5");
            byte buf[] = s.getBytes("BIG5");
            System.out.print("Converted bytes:" );
            for (int b=0;b<buf.length;b++) {
               System.out.print(hexStr(buf[b])+" ");
            }
            System.out.println("");

        }

        } catch (java.io.UnsupportedEncodingException e) {
        System.out.println("ERROR: "+ e);
        }
    }
}

Here's the output:

platform encoding = BIG5
Original bytes :0xf9 0xd4
Converted bytes:0xf9 0xd4 
Original bytes :0xf9 0xd5
Converted bytes:0xf9 0xd5 
Original bytes :0xf9 0xd6
Converted bytes:0x3f 
Original bytes :0xf9 0xd7
Converted bytes:0x3f 
Original bytes :0xf9 0xd8
Converted bytes:0x3f 
Original bytes :0xf9 0xdd
Converted bytes:0x3f 
Original bytes :0xf9 0xde
Converted bytes:0x3f

Comments
EVALUATION 5 out of 7 codepoints requested have been in big5_solaris already (per Solaris L10N team's request, they had also added these characters into Solaris 9's Big mapping table). As commented in "Comments", these are NOT Big5 codepings but Big5+, we have a specific CR#4421440 for that purpose. Closed this bug as dup of that CR.
25-07-2006

EVALUATION Bug 4480620 expresses the need for the Java Big5 converter to include 7 additional characters 0xf9d6->0xf9dc inclusive based on their prevalence in Taiwanese Big5 locales and their presence within the native converter in Solaris 9. A fix for 4480620 would partially address the missing character list in this bug report. However, this bug reports a slightly different range of characters, codepoints 0xf9dd and 0xf9de are exclusive to this enhancement request. Ian.Little@Ireland 7/16/2001. For 1.4 and Solaris zh_TW.BIG5 default encoding only the extended set of chars 0xf9d6-0xf9dc and appropriate mappings to/from Unicode will be added to a special subclassed converter. This is tracked in bug 4480620. 0xf9dd and 0xf9de are box/graphical drawing characters which are determined to be of much less consequence and importance compared to the requirement to have the additional Hanzi characters, 0xf9d6-0xf9dc supported. Downgrading priority of bug to p4 assuming that 4480620 (which will be addressed for Java 1.4) addresses the main issues raised in this bug. ###@###.### 2001-08-30
30-08-2001