JDK-4279804 : Display of Group 1 National characters in CP 850 doesn't work
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Affected Version: 1.1.7
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • OS: windows_nt
  • CPU: x86
  • Submitted: 1999-10-09
  • Updated: 1999-10-11
  • Resolved: 1999-10-11
Related Reports
Duplicate :  
Description

Name: krT82822			Date: 10/09/99


I am running on Windows NT 4.0 with Service Pack 4.

java full version "JDK 1.1.7 IBM build n117p-19990618 (JIT enabled: ibmjitc)"

First, I have a German properties file that is encoded in Code
Page 850. I converted the file to Unicode by running
native2ascii against the file, specifying an -encoding option of
"Cp850".  I have a program called DumpMsgs that reads the strings
from the Unicode version of the file and displays them in the
DOS window, which is Code Page 850.  (I will append the source
code for DumpMsgs at the end of this note.)  To make things
easier, I redirected the output of the DumpMsgs tool to a file.
When I display the contents of the file in the DOS window, none
of the German National characters display correctly.  However,
when I view the file using the Windows "Write" (or wordpad)
editor, all of the characters display correctly.  This implies
that the Code Page of the strings output by the DumpMsgs tool is
code page 1252 (or ISO8859-1).  I would have expected the output
to be in code page 850 since the original properties file was in
code page 850 and I specified a native2ascii encoding of "Cp850".
This particular problem happens to the National characters in
all of the Group 1 SBCS (Single Byte Character Set) languages
(German, French, Italian, Spanish, and Brazilian Portuguese).
The interesting thing is that when I converted the source
properties file to code page 1252 and then converted it to
Unicode using a native2ascii -encoding option of "Cp1252", I got
the same results.  That is, the National characters did not
display correctly in the DOS window, but did with the wordpad
editor.  However, since the source file and the native2ascii
-encoding were both code page 1252, I would have expected this
result.

Here is a related, but slightly different problem: Since I want
to have the contents of the properties file display correctly in
the DOS window (code page 850), I experimented with different
combinations of the source file encoding and the native2ascii
-encoding option.  The combination that worked the best was to
have the source properties file in code page 850 but use a
native2ascii -encoding option of "Cp1252".  I don't know that
this is a valid thing to do, but it seemed to work okay for the
most part.  All of the National characters displayed correctly
in the DOS window for all of the Group 1 SBCS languages, with
the exception of two German characters: the "u" with 2 dots
above it and the capital 'A' with 2 dots above it.  (I apologize
that I don't know the official names of these characters.)  All
of the other German National characters (at least the ones that
are used in my properties file) displayed correctly in the DOS
window.

I would like to see if there is a workaround or fix that will
allow all the National characters for the Group 1 SBCS languages
to display correctly in the DOS window (code page 850).

Following is the source Java code for the DumpMsgs tool.  The
input to DumpMsgs is the name of the properties file whose
strings you want to display.  I pass in the name of the file
that is output from the native2ascii command.

----------------- START DumpMsgs HERE -------------------

import java.io.FileInputStream;

import java.util.Enumeration;
import java.util.Properties;

public class DumpMsgs
{
  public static void main (String[] args)
  {
    try
    {
      FileInputStream file  = new FileInputStream (args[0]);
      Properties      props = new Properties ();
      props.load (file);
      Enumeration     list  = props.propertyNames ();
      while (list.hasMoreElements ())
      {
        String key = (String)list.nextElement ();
        System.out.println ("key   = " + key);
        System.out.println ("value = " + props.get (key));
        System.out.println ();
      }
    }
    catch (Exception e)
    {
      System.out.println (e);
    }
  } // main
} // class DumpMsgs

----------------------

10/1/99 more info from user (in reply to suggestion that bug # 4038677 might offer useful info):

Thank you for your response.  I apologize for not getting back to you sooner.
Your first e-mail was buried in a bunch of other e-mail.

I tried the workaround, but the result was the same.

I left the source code in Code Page 850 and converted it to Unicode with the
"Cp852" native2ascii encoding.  When I used my DumpMsgs tool to print the file,
it still displayed as if it were Code Page 1252 because all of the German
National characters are corrupted in the DOS command window.

I then tried to convert the source in Code Page 850 to Unicode with the Cp1250
native2ascii encoding.  This resulted in the same thing that I get with my
partial workaround -- the u-umlat is still corrupted while the rest of the
German National characters display correctly.

The next thing I'll try is to convert the Code Page 850 source to Code Page 852
using iconv (if I can).  Then I will try to convert it to Unicode using the
Cp852 native2ascii encoding.  I'll let you know how that works.

If you have any other suggestions, or if I didn't do the workaround correctly,
please let me know.

--------------

10/6/99 from user:

Converting the source file to CP852 (using iconv), then converting it to Unicode
using a native2ascii encoding of "Cp852" did not work.  The file output from
native2ascii still is CP 1252.  (I know this because the special characters in
the file are displayed correctly in a Windows editor.)

I still have not found a way to get all of the German special characters to
display correctly in the DOS (CP 850) window.


10/9/99 eval1127@eng -- am filing reference bug #
(Review ID: 95816) 
======================================================================

Comments
WORK AROUND Name: krT82822 Date: 10/09/99 My workaround for getting most of the SBCS National characters to display correctly in Code Page 850 is to keep the source properties file in Code Page 850 and convert it to Unicode using a native2ascii -encoding option of "Cp1252". However, even using this approach, the 2 German characters mentioned in the problem description are not displayed correctly in Code Page 850). ======================================================================
11-06-2004