JDK-6372100 : CharsetDecoder.decode fails for single-byte input for many CJK encodings
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Affected Version: 5.0
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • OS: linux
  • CPU: x86
  • Submitted: 2006-01-12
  • Updated: 2010-04-02
  • Resolved: 2006-01-13
Related Reports
Duplicate :  
Description
FULL PRODUCT VERSION :
java version "1.5.0_06"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05)
Java HotSpot(TM) Client VM (build 1.5.0_06-b05, mixed mode, sharing)


ADDITIONAL OS VERSION INFORMATION :
Linux honolulu.ilog.fr 2.4.21-0.13mdk #1 Fri Mar 14 15:08:06 EST 2003 i686 unknown


A DESCRIPTION OF THE PROBLEM :
For many CJK encodings, trying to decode a single-byte input buffer
yields 0-characters output. It should yield a 1-character output for
bytes in the ASCII range (0 to 0x7f) and a MalformedInputException for
bytes from 0x80 to 0xff.


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
javac niobug1.java
java niobug1


EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
No output.

ACTUAL -
Charset Big5_HKSCS: 256 errors
Charset Big5_Solaris: 256 errors
Charset Big5: 256 errors
Charset Cp1381: 256 errors
Charset Cp1383: 256 errors
Charset Cp930: 256 errors
Charset Cp933: 256 errors
Charset Cp935: 256 errors
Charset Cp937: 256 errors
Charset Cp939: 256 errors
Charset Cp942: 256 errors
Charset Cp942C: 256 errors
Charset Cp943: 256 errors
Charset Cp943C: 256 errors
Charset Cp948: 256 errors
Charset Cp949: 256 errors
Charset Cp949C: 256 errors
Charset Cp950: 256 errors
Charset Cp970: 256 errors
Charset EUC_CN: 256 errors
Charset EUC_JP_Solaris: 256 errors
Charset EUC_JP: 256 errors
Charset EUC_KR: 256 errors
Charset GBK: 256 errors
Charset JIS0208: 256 errors
Charset JIS0212: 256 errors
Charset Johab: 256 errors
Charset MS932: 256 errors
Charset MS936: 256 errors
Charset MS949: 256 errors
Charset MS950_HKSCS: 256 errors
Charset MS950: 256 errors
Charset PCK: 256 errors
Charset SJIS: 256 errors


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.io.*;
import java.nio.*;
import java.nio.charset.*;

public class niobug1 {
  public static void main (String[] args) throws CharacterCodingException {
    String[] encodings = {
      "Big5_HKSCS",
      "Big5_Solaris",
      "Big5",
      "Cp1381",
      "Cp1383",
      "Cp930",
      "Cp933",
      "Cp935",
      "Cp937",
      "Cp939",
      "Cp942",
      "Cp942C",
      "Cp943",
      "Cp943C",
      "Cp948",
      "Cp949",
      "Cp949C",
      "Cp950",
      "Cp970",
      "EUC_CN",
      "EUC_JP_Solaris",
      "EUC_JP",
      "EUC_KR",
      "GBK",
      "JIS0208",
      "JIS0212",
      "Johab",
      "MS932",
      "MS936",
      "MS949",
      "MS950_HKSCS",
      "MS950",
      "PCK",
      "SJIS",
    };
    for (int n = 0; n < encodings.length; n++) {
      String encoding = encodings[n];
      Charset charset = Charset.forName(encoding);
      CharsetDecoder converter = charset.newDecoder();
      converter = converter.onMalformedInput(CodingErrorAction.REPORT);
      converter = converter.onUnmappableCharacter(CodingErrorAction.REPORT);
      int errors = 0;
      for (int b = 0; b < 0x100; b++) {
        ByteBuffer in = ByteBuffer.wrap(new byte[] { (byte)b });
        try {
          CharBuffer out = converter.decode(in);
          if (out.length() == 0)
            errors++;
        } catch (MalformedInputException e) {
        }
      }
      if (errors > 0)
        System.err.println("Charset "+encoding+": "+errors+" errors");
    }
  }
}

---------- END SOURCE ----------

Comments
EVALUATION dup of 6196991, which has been fixed in mustang. might worth backporting to 5.0u
13-01-2006