JDK-5058133 : iso2022 encoders throw BufferOverflowException
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Affected Version: 5.0
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2004-06-06
  • Updated: 2004-06-17
  • Resolved: 2004-06-17
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
5.0 b57Fixed
Related Reports
Relates :  
Description
The charsets ISO-20220-KR, ISO-2022-CN-CNS, ISO-2022-CN-GB sometimes
throw BufferOverflowException, especially when encoding one char.

For example, the following test program:
--------------------------------------------------------------
import java.io.*;
import java.util.*;
import java.nio.charset.*;
import java.nio.*;

public class FindOneCharEncoderBugs {
    public static void main(String[] args) throws Exception {
	for (Map.Entry<String,Charset> e
		 : Charset.availableCharsets().entrySet()) {
	    String csn = e.getKey();
	    Charset cs = e.getValue();
	    int failures = 0;
	    System.out.println(csn);
	    if (csn.equals("x-IBM933")) continue; // hangs!

	    // Ignore decoder-only charsets
	    try { cs.newEncoder(); }
	    catch (UnsupportedOperationException x) { continue; }
		    
	    for (int i = 0; failures < 5 && i <= 0xffff; i++) {
		String s = new String(new char[] { (char)i });
		try {
		    s.getBytes(csn);
		} catch (BufferOverflowException x) {
		    System.out.printf("Overflow: charset=%s char=%x%n", csn, i);
		    failures++;
		    i += 100;
		} catch (Throwable t) {
		    System.out.printf("%s charset=%s char=%x%n", t, csn, i);
		    i += 100;
		}
	    }
	}
    }
}
--------------------------------------------------------------
prints (among other things):

Overflow: charset=ISO-2022-KR char=a1
Overflow: charset=ISO-2022-KR char=111
Overflow: charset=ISO-2022-KR char=2c7
Overflow: charset=ISO-2022-KR char=391
Overflow: charset=ISO-2022-KR char=401
Overflow: charset=x-ISO-2022-CN-CNS char=a7
Overflow: charset=x-ISO-2022-CN-CNS char=2c7
Overflow: charset=x-ISO-2022-CN-CNS char=391
Overflow: charset=x-ISO-2022-CN-CNS char=2013
Overflow: charset=x-ISO-2022-CN-CNS char=2103
Overflow: charset=x-ISO-2022-CN-GB char=a4
Overflow: charset=x-ISO-2022-CN-GB char=113
Overflow: charset=x-ISO-2022-CN-GB char=1ce
Overflow: charset=x-ISO-2022-CN-GB char=2c7
Overflow: charset=x-ISO-2022-CN-GB char=391

It is particularly egregious that strings consisting of one ASCII character
cannot be correctly encoded.
###@###.### 2004-06-06

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: tiger-rc FIXED IN: tiger-rc INTEGRATED IN: tiger-b57 tiger-rc
25-06-2004

EVALUATION ISO2022 encoders might need 8 bytes to encode a single char: a 4-byte designator sequence, a SI, the 2 bytes of actual data, and a SO. (In theory even more might be required, but in practice, 8 is enough). The limit in the code is 4. ###@###.### 2004-06-06
06-06-2004

SUGGESTED FIX --- /u/martin/ws/tiger/src/share/classes/sun/nio/cs/ext/ISO2022.java 2004-01-12 14:53:20.699992000 -0800 +++ /u/martin/ws/maxBytes/src/share/classes/sun/nio/cs/ext/ISO2022.java 2004-06-05 15:52:35.670885000 -0700 @@ -414,7 +414,7 @@ private boolean newSS3DesDefined = false; protected Encoder(Charset cs) { - super(cs, 4.0f, 4.0f); + super(cs, 4.0f, 8.0f); } public boolean canEncode(char c) { ###@###.### 2004-06-06
06-06-2004