JDK-4035543 : Properties.load() doesn't work correctly against multi-byte character strings.
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util
  • Affected Version: 1.1_alpha,1.1,1.3.0
  • Priority: P3
  • Status: Closed
  • Resolution: Won't Fix
  • OS:
    generic,solaris_2.5.1,solaris_9,windows_95 generic,solaris_2.5.1,solaris_9,windows_95
  • CPU: generic,x86,sparc
  • Submitted: 1997-02-28
  • Updated: 2022-01-14
  • Resolved: 1998-02-17
Related Reports
Duplicate :  
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Description

Name: mc57594			Date: 02/28/97


1. Create a property file including the following line;
hello=.... // Assume .... as a multi-byte character string.
2. Call the method load() of the class Properties against  the above property file.
3. Call the method getProperty() against the above property object with the key "hello", then println() it.
4. Garbage characters was displayed.

I looked into the source file Properties.java provided with JDK1.1. The method load() is using the deprecated and not-completely-internationalized method "getLocalizedInputStream()". I think it causes this problem.

company - Sybase K.K. , email - ###@###.###
======================================================================

Comments
WORK AROUND see evaluation
11-06-2004

EVALUATION The problem is that Properties.load uses Runtime.getRuntime().getLocalizedInputStream(in) to return a stream from which to read characters. Likewise Properties.save uses Runtime.getRuntime().getLocalizedOutputStream(out). Both of these methods are no-ops and have been deprecated. The result is that chars are converted to Unicode from the byte stream by adding 8 bits. The right thing to do here is to have load create an InputStreamReader from the given Stream. This will convert bytes to chars using the default transcoder. An overloaded load method should also take an InputStreamReader (or maybe just a Reader) directly so that callers aren't forced to use just the default encoding. The only problem with making this change is that the use of getLocalized[Input|Output]Stream is spec'ed in the JLS pg 640. We need to convince the JLS Authors that this is a mistake and the spec should be amended. brian.beck@Eng 1997-05-22 On second thought we cannot change this so easily. Making the above change will break PropertiesResourceBundles. Right now, properties files are always encoded in the ASCII with Java Unicode escapes encoding. A PropertyResourceBundle can always be read correctly, regardless of what the VM's default encoding is set to. But if properties files could be written in any encoding, you would never know how to read a particular property file. The program would have to encode this knowledge somehow, which is a bad choice because it is translators who would decide what encoding to use for a particular copy of the property file. Its particularly bad for Applets because they aren't even able to change the default encoding. This is definately an issue that should be worked out, but not until we do our planned update to ResourceBundles. Fortunately there is a workaround. Since property files accept the \uxxxx notation, any Unicode value can be represented. Property files simply need to be run through the native2ascii filter before being distributed. The only issue for now should be how to describe this situation while still giving us room to resolve it in the future. brian.beck@Eng 1997-09-09 Properties load and save now use an internal Reader and Writer. These always use the 8859_1 encoding because the consensus here was to make all properties files use the same format to keep them interchangeable. michael.mccloskey@eng 1998-02-16 Note that ISO8859-1 is not interchangeable and ASCII is virtually interchangeable. I believe JDK must define PCS or Portable Character Set that can be found in other standards, such as XPG4, and explicitly define that properties must be defined using only PCS characters. Therefore, the ISO8859-1 (R) characters must be excluded. masayoshi.okutsu@Eng 1998-02-17
17-02-1998