Bug ID: JDK-5031167 NLS: CONVERTING BYTE ARRAY TO STRING USING EUC

Type: Bug
Component: core-libs
Sub-Component: java.nio.charsets
Affected Version: 1.4.1

Priority: P3
Status: Closed
Resolution: Duplicate
OS: solaris_8
CPU: sparc

Submitted: 2004-04-13
Updated: 2005-07-21
Resolved: 2005-07-21

Name: wm7046			Date: 04/13/2004


When converting a byte array (contains an invalid byte sequence for EUC_CN) to
a String using charset "EUC_CN", the result is different before and after
calling new String(bytes, "ISO-8859-1"), e.g:

String s1 = new String(bytes, "EUC_CN");
String s2 = new String(bytes, "ISO-8859-1");
String s3 = new String(bytes, "EUC_CN");

we see s1 is different from s3.
The test case:

public class TestString
{  
  public static void main(String[] args)
  {
    byte[] bytes = {-122,72,-122,-9};
    try {
      String str = null;
      int hash = 0;
      String outstr = null;
//-------- Before iso-8859
    str = new String(bytes,"EUC_CN");
      hash = str.hashCode();
      outstr = "";
      for (int i = 0; i < str.length(); i++)
        outstr += " " + (long) str.charAt(i);
      System.out.println("String-(EUC_CN)" + outstr);
      System.out.println("hash:" + hash);
//-------------iso-8859
      str = new String(bytes,"ISO-8859-1");
      hash = str.hashCode();
      outstr = "";
      for (int i = 0; i < str.length(); i++)
        outstr += " " + (long) str.charAt(i);
      System.out.println("String-ISO-8859-1:" + outstr);
      System.out.println("hash:" + hash);
//-------------After iso-8859
    str = new String(bytes,"EUC_CN");
      hash = str.hashCode();
      outstr = "";
      for (int i = 0; i < str.length(); i++)
        outstr += " " + (long) str.charAt(i);
      System.out.println("String-(EUC_CN)" + outstr);
      System.out.println("hash:" + hash);
    }catch (Exception e)
    {
      e.printStackTrace();
    }
  }
}
(Incident Review ID: 244585) 
======================================================================

EVALUATION This could be reproduced using Solaris 8 when running in the zh_CN.EUC locale and not in other locales. I've noted it is fixed in later J2SE releases and this is most likely a duplicate (same root cause) as 4838512 which is fixed/integrated in 1.4.1_07 and 1.4.2_05 (as well as in 1.5.0_beta). 4838512 fixed an issue whereby legacy sun.io converters were being incorrectly engaged during VM startup when VM was started in multibyte locales. The fix ensures that the newer java nio based charset implementations are used and this removes some noted inconsistencies. ###@###.### 2004-04-14 problem no longer reproducible in 14.2_05, 5.0 and 6.0, close as the dup of 4838512 as suggested by i.little. ###@###.### 2005-07-21 08:07:30 GMT

14-04-2004