United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-6196991 : (cs) Many character decoders fail to convert single-byte (e.g. ASCII) input

Details
Type:
Bug
Submit Date:
2004-11-18
Status:
Resolved
Updated Date:
2013-08-13
Project Name:
JDK
Resolved Date:
2005-03-19
Component:
core-libs
OS:
linux,windows_xp
Sub-Component:
java.nio
CPU:
x86
Priority:
P2
Resolution:
Fixed
Affected Versions:
1.4.2,5.0
Fixed Versions:

Related Reports
Backport:
Backport:
Backport:
Duplicate:
Duplicate:

Sub Tasks

Description
FULL PRODUCT VERSION :
java version "1.4.2_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_01-b06)
Java HotSpot(TM) Client VM (build 1.4.2_01-b06, mixed mode)

ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows XP [Version 5.1.2600]

EXTRA RELEVANT SYSTEM CONFIGURATION :
Using English version of Windows.
Default locale is "en"


A DESCRIPTION OF THE PROBLEM :
Enclosed program does round-trip conversion of a string from default encoding to SJIS, and then back.  When "ABC" is processed it correctly comes back as "ABC".  When a single character string is processed, it comes back as an empty string.



STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run provided Java program

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Printed output should be:

roundTrip(A)=A
roundTrip(ABC)=ABC
ACTUAL -
Actual output is:

roundTrip(A)=
roundTrip(ABC)=ABC

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
package gcr.db.test;

import java.nio.*;
import java.nio.charset.*;

/**
 * Demonstrates apparent bug in SJIS encoding.
 */
public class SjisBug {
  public static void main(String[] args) {
    System.out.println("roundTrip(A)="+roundTrip("A")); // Gets empty string !
    System.out.println("roundTrip(ABC)="+roundTrip("ABC")); // Gets ABC, as expected.
  }
  static String roundTrip(String str) {
    Charset cs=Charset.forName("SJIS");
    return cs.decode(cs.encode(str)).toString();
  }
}



---------- END SOURCE ----------
###@###.### 2004-11-18 04:18:09 GMT

                                    

Comments
WORK AROUND

If the input is only one byte, you can "pad" the input with some extra
data with known decoding (e.g. ASCII), and discard it after decoding.
###@###.### 2005-2-10 17:47:21 GMT
                                     
2005-02-10
SUGGESTED FIX

--- /tmp/geta10520	2005-02-10 00:45:59.407012600 -0800
+++ Charset-X-Coder.java	2005-02-09 23:15:11.996877000 -0800
@@ -740,30 +740,31 @@
      *          position cannot be mapped to an equivalent $otype$ sequence and
      *          the current unmappable-character action is {@link
      *          CodingErrorAction#REPORT}
      */
     public final $Otype$Buffer $code$($Itype$Buffer in)
 	throws CharacterCodingException
     {
+	int remaining = in.remaining();
+	if (remaining == 0)
+	    return $Otype$Buffer.allocate(0);
 	int n = (int)(in.remaining() * average$ItypesPerOtype$());
 	$Otype$Buffer out = $Otype$Buffer.allocate(n);
 
-	if (n == 0)
-	    return out;
 	reset();
 	for (;;) {
 	    CoderResult cr;
 	    if (in.hasRemaining())
 		cr = $code$(in, out, true);
 	    else
 		cr = flush(out);
 	    if (cr.isUnderflow())
 		break;
 	    if (cr.isOverflow()) {
-		n *= 2;
+		n = 2*n + 1;	// Ensure progress; n might be 0!
 		$Otype$Buffer o = $Otype$Buffer.allocate(n);
 		out.flip();
 		o.put(out);
 		out = o;
 		continue;
 	    }
 	    cr.throwException();

###@###.### 2005-2-10 08:48:27 GMT
                                     
2005-02-08
EVALUATION

Looks like Charset-X-Coder's encode/decode might fail if in.remaining == 1 and
average$ItypesPerOtype$ is less than 1.

###@###.### 2005-2-08 07:14:46 GMT
                                     
2005-02-08



Hardware and Software, Engineered to Work Together