JDK-4954023 : missing/dropped character in zh_CN.GB18030 locale
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Affected Version: 6.1,1.4.1_05,6
  • Priority: P2
  • Status: Closed
  • Resolution: Cannot Reproduce
  • OS: solaris_8,solaris_9
  • CPU: x86,sparc
  • Submitted: 2003-11-13
  • Updated: 2005-07-21
  • Resolved: 2005-07-21
Related Reports
Relates :  
Relates :  
Description
Steps to reproduce:

Extract the attached gbtest.java
compile with jdk1.4.1_05 on Solaris 9
Set CDE locale to zh_CN.GB18030
run java gbtest text.out.gb18030 count.out /tmp/xxx
run diff text.out.gb18030 /tmp/xxx 

Perform same steps in C locale and you will see the input and output files are identitcal.The attached program reads a file in GB18030 encoding and writes in GB18030 encoding. During the round trip few characters are lost if the program is run in zh_CN.GB18030 locale and works correctly if it is run C locale


Comments
EVALUATION Investigating cause of this ###@###.### 2003-11-14 Reproduced this issue and narrowed down root cause. A side effect of fix for bugID 4685305 (Esc ID 544539). "Charset.{forName()}{.isSupported()} is not thread safe" is that where the default charset for a platform locale is one of the extended charsets provided by sun.nio.cs.ext.ExtendedCharsets then charset caching and setup will cache and use the older sun.io converters rather than the newer sun.nio.cs.ext ones. 4685305/544539 was integrated into 1.4.1_03 and between then and recent builds of 1.5 for the locale zh_CN.GB18030 the charset utilized for charset conversions is the older sun.io converter. There is a latent bug within the sun.io.CharToByteGB18030 converter A fix for this behaviour is provided within bugID 4838512 (Esc ID 548688) which is due to be patched into 1.4.1_07 and 1.4.2_04 updates of 1.4.1 and 1.4.2 respectively. 4838512 ensures that default platform charsets are hardwired avoiding the caching behaviour which previously forced the JRE to let sun.io converters take precedence to sun.nio.cs (tested and continuously maintained) converters when running in the locale in which the encoding is default. ###@###.### 2003-11-17 Given the fact that we have moved on to nio completely in mustang the offending sun.io.CharToByteGB18030 is no long in our binary, I am closing this bug. ###@###.### 2005-07-21 00:03:40 GMT
2005-07-21

WORK AROUND run with locale such as one of the UTF-8 CN locales as opposed to zh_CN.GB18030 Problem will only occur if run in the zh_CN.GB18030 locale. Other Chinese locales not affected. ###@###.### 2003-11-18
2003-11-18