JDK-4867251 : OutputStreamWriter/InputSreamReader convert NEL to linefeed with Cp037 encoding
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Affected Version: 1.4.2
  • Priority: P4
  • Status: Closed
  • Resolution: Not an Issue
  • OS: linux
  • CPU: x86
  • Submitted: 2003-05-21
  • Updated: 2017-08-27
  • Resolved: 2017-08-27
Related Reports
Relates :  
Description
Name: rmT116609			Date: 05/20/2003


FULL PRODUCT VERSION :
java version "1.4.2-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-beta-b19)
Java HotSpot(TM) Client VM (build 1.4.2-beta-b19, mixed mode)

FULL OS VERSION :
Linux stallion.elharo.com 2.4.18-6mdk #1 Fri Mar 15 02:59:08 CET 2002 i686 unknown

A DESCRIPTION OF THE PROBLEM :
When InputStreamReader is using the Cp037 (EBCDIC US) encoding and reads a NEL (Unicode 0x85 and EBCDIC 0x15) it converts it into a linefeed (\n). When OutputStreamWriter writes a linefeed in the Cp037, it instead writes a NEL.

NEL and linefeed are *not* the same character. Cp037 has separate, distinct code points for linefeed and NEL. It is important for XML parsing, among other uses, that they not be confused. The linefeed character qualifies for white space in XML. NEL does not. Several XML parsers have serious errors as a result of depending on Java to convert EBCDIC to Unicode.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run attached program

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
This code should output and read in all three common line end chars: NEL, linefeed, and carriage return. In both cases only two are seen. On output all linefeeds are changed to NELs. On input all NELs are changed to linefeeds.
ACTUAL -
Testing input stream
10
10
10
10
10
10
10
10
13
13
13
13
Testing output stream
0x15
0x15
0x15
0x15
0xD
0xD
0xD
0xD
0x15
0x15
0x15
0x15
0xD
0x15
0xD
0x15
0xD
0x15
0xD
0x15

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.io.*;

public class NELTest {
 
  public static void main(String[] args) throws Exception {
  
    System.out.println("Testing input stream");
    byte[] data = {(byte) 0x15, (byte) 0x15, (byte) 0x15, (byte) 0x15, (byte) 0x25, (byte) 0x25, (byte) 0x25, (byte) 0x25, (byte) 13, (byte) 13, (byte) 13, (byte) 13};
    ByteArrayInputStream in = new ByteArrayInputStream(data);
    InputStreamReader reader = new InputStreamReader(in, "Cp037");
    int c;
    while ((c = reader.read()) != -1) {
        System.out.println(c);
    }
     
    System.out.println("Testing output stream");
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    OutputStreamWriter writer = new OutputStreamWriter(out, "Cp037");
    writer.write((char) 0x85);
    writer.write((char) 0x85);
    writer.write((char) 0x85);
    writer.write((char) 0x85);
    writer.write((char) 13);
    writer.write((char) 13);
    writer.write((char) 13);
    writer.write((char) 13);
    writer.write((char) 10);
    writer.write((char) 10);
    writer.write((char) 10);
    writer.write((char) 10);
    writer.write((char) 13);
    writer.write((char) 10);
    writer.write((char) 13);
    writer.write((char) 10);
    writer.write((char) 13);
    writer.write((char) 10);
    writer.write((char) 13);
    writer.write((char) 10);
    writer.flush();
    writer.close();
    
    byte[] result = out.toByteArray();
    for (int i = 0; i < result.length; i++) {
        System.out.println("0x" + Integer.toHexString(result[i]).toUpperCase());
    }
     
  }
    
    
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
I've written my own special purpose EBCDIC writer that correctly converts NELs to linefeeds. For input I don't yet have a workaround, since the bug tends to manifest itself fairly deeply inside XML parsers.
(Review ID: 185599) 
======================================================================

Comments
0x15 <=> u+000a should be the expected behavior
27-08-2017

EVALUATION The issue of NEL/linefeed handling by the J2SE IBM/host/EBCDIC converters has been raised before, see bugID 4159519. Within the next J2SE feature release, Tiger / 1.5 it is planned that the host/ebcdic converters will be de-bundled from the default J2RE download and there will be a convenient means for customers to download/update/deploy support for IBM/host encodings by means of a downloadable java.nio CharsetProvider extension jar file. The NEL/linefeed handling issue for Cp037 affects the old style, sun.io converters which are in the process of being retired from the platform so the most appropriate timeframe to address this issue is with the 1.5 release. Bugs such as this one will be appraised and suitably addressed where possible as part of the NIO migration effort. ###@###.### 2003-05-21
21-05-2003