JDK-6211145 : ISO-2022-JP encoder doesn't output tail escape sequence
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Affected Version: 1.4.2
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2004-12-21
  • Updated: 2011-02-16
  • Resolved: 2005-02-10
Related Reports
Duplicate :  
Relates :  
Description
FULL PRODUCT VERSION :
java version "1.4.2_06"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_06-b03)
Java HotSpot(TM) Client VM (build 1.4.2_06-b03, mixed mode)


A DESCRIPTION OF THE PROBLEM :
Problems faced is described below
1) flush() and close() methods of the OutputStreamWriter are inconsistent
2) encode(CharBuffer in) method in the CharsetEncoder is misbehaving.

CharsetEncoder#flush wasn't call appropriate in OutputStreamWriter#flush and in CharsetEncoder#encode(CharBuffer) (?).

test code below.

import java.io.*;
import java.nio.*;
import java.nio.charset.*;

class Test {
    public static void main(String[] args) throws Exception {
        String m = "\u3042\u3043\u3044";
        System.out.println("expect result was:");
        System.out.println("1B 24 42 24 22 24 23 24 24 1B 28 42");
        System.out.println();

        System.out.println("test getBytes");
        System.out.println(dump(m.getBytes("ISO-2022-JP"))); // OK
        System.out.println();

        System.out.println("test OutputStreamWriter with close");
        ByteArrayOutputStream b = new ByteArrayOutputStream();
        OutputStreamWriter w = new OutputStreamWriter(b, "ISO-2022-JP");
        w.write(m.toCharArray());
        w.close();
        System.out.println(dump(b.toByteArray())); // OK
        System.out.println();

        System.out.println("test OutputStreamWriter with flush");
        b = new ByteArrayOutputStream();
        w = new OutputStreamWriter(b, "ISO-2022-JP");
        w.write(m.toCharArray());
        w.flush();
        System.out.println(dump(b.toByteArray())); // NG: not call CharsetEncoder#flush
        System.out.println();

        System.out.println("test CharsetEncoder#encode(CharBuffer)");
        CharsetEncoder e = Charset.forName("ISO-2022-JP").newEncoder();
        ByteBuffer buf = e.encode(CharBuffer.wrap(m));
        System.out.println(dump(buf.array())); // NG
        System.out.println();

        System.out.println("test CharsetEncoder#encode(CharBuffer, ByteBuffer, boolean)");
        /*
        e.reset(); // don't work(another bug in sun.nio.cs.ext.ISO2022_JP#reset)
        /*/
        e = Charset.forName("ISO-2022-JP").newEncoder();
        //*/
        buf = ByteBuffer.allocate(64);
        System.out.println(e.encode(CharBuffer.wrap(m), buf, true));
        System.out.println(e.flush(buf)); // <- required
        System.out.println(dump(buf.array())); // OK
    }


    public static String dump(byte[] bytes) {
        StringBuffer s = new StringBuffer();
        for (int i = 0; i < bytes.length; i++) {
            s.append("0123456789ABCDEF".charAt(bytes[i] >> 4 & 0x0f));
            s.append("0123456789ABCDEF".charAt(bytes[i]      & 0x0f));
            s.append(' ');
        }
        return new String(s);
    }
}


P.S. CharsetEncoder#reset() doesn't work at ISO-2022-JP encoder.



REPRODUCIBILITY :
This bug can be reproduced always.
###@###.### 2004-12-21 11:13:14 GMT

Comments
EVALUATION The semantics here are tricky to get right. I don't think we want OutputStreamWriter to flush the underlying encoder when the stream is flushed. Flushing the stream means to write out all the data that definitely needs to be written, but it is unknown whether there will be more data to be written, and we only want the escape sequences to be written when necessary, e.g. if the character set changes or we're at end-of-stream. On the other hand, the reset bug is a clear, simple, and embarrassing little thinko that we can easily fix. I've opened 6226510 ISO-2022-JP encoder's reset() method noop; should revert to ASCII to fix that problem. However, this one is Not A Bug. ###@###.### 2005-2-08 03:14:21 GMT I've re-opened this one since CharsetEncoder e = Charset.forName("ISO-2022-JP").newEncoder(); ByteBuffer buf = e.encode(CharBuffer.wrap(m)); is actually valid. Sorry about that. ###@###.### 2005-2-08 03:57:31 GMT To address the various issues reported, we are fixing 6226510 ISO-2022-JP encoder's reset() method noop; should revert to ASCII 6221056 CharsetEncoder.encode(ByteBuffer) should call flush(ByteBuffer) and the other reported behavior is as intended. I'm closing this bug as a dup of 6221056. ###@###.### 2005-2-10 18:36:45 GMT
08-02-2005