Bug ID: JDK-5033591 I18N - different behavior when reading invalid char in different locale

Type: Bug
Component: core-libs
Sub-Component: java.nio.charsets
Affected Version: 1.4.2_04

Priority: P2
Status: Closed
Resolution: Duplicate
OS: solaris
CPU: generic

Submitted: 2004-04-19
Updated: 2004-04-19
Resolved: 2004-04-19

Create a text file containing some invalid chinese characters.

And then open and read it with FileReader class in zh_CN, zh_CN.GBK, zh_CN,UTF-8 and zh_CN.GB18030 locale.

in zh_CN,zh_CN.GBK, zh_CN.UTF-8 locale, the read method will not throw exception and the invalid chinese character is replaced with white space.

in zh_CN.GB18030 locale, the read() method will throw out MalformedException when reading the invalid ml character.

EVALUATION Not reproducible with 1.5.0 latest promoted build (b47). ###@###.### 2004-04-19 This inconsistency which occurs particularly for CJK charset behaviour running in CJK locales is addressed by bugID 4838512 which is fixed/integrated in 1.4.2_05 and 1.4.1_07 maintenance J2SE releases. The fix is present in 1.5.0 beta promoted builds including the currently promoted one b47 at time of this bug evaluation. 4838512 fixes an issue whereby older sun.io converter implementations for character encoding and decoding are cached during VM startup during default encoding determination for a particular locale. This is an issue for encodings which are part of the extended set of encodings provided by the Sun provider $JRE/lib/charsets.jar and not for those basic set of encodings contained within rt.jar. The fix ensures that the charsets.jar extended charset provider is hardwired into the provider lookup code so as to defeat the caching behaviour which causes inconsistencies as seen in this case. Closing out as duplicate of 4838512 ###@###.### 2004-04-19

19-04-2004