United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-4954023 missing/dropped character in zh_CN.GB18030 locale
JDK-4954023 : missing/dropped character in zh_CN.GB18030 locale

Details
Type:
Bug
Submit Date:
2003-11-13
Status:
Closed
Updated Date:
2005-07-21
Project Name:
JDK
Resolved Date:
2005-07-21
Component:
core-libs
OS:
solaris_9,solaris_8
Sub-Component:
java.nio.charsets
CPU:
x86,sparc
Priority:
P2
Resolution:
Cannot Reproduce
Affected Versions:
6.1,1.4.1_05,6
Fixed Versions:

Related Reports
Relates:
Relates:

Sub Tasks

Description
Steps to reproduce:

Extract the attached gbtest.java
compile with jdk1.4.1_05 on Solaris 9
Set CDE locale to zh_CN.GB18030
run java gbtest text.out.gb18030 count.out /tmp/xxx
run diff text.out.gb18030 /tmp/xxx 

Perform same steps in C locale and you will see the input and output files are identitcal.The attached program reads a file in GB18030 encoding and writes in GB18030 encoding. During the round trip few characters are lost if the program is run in zh_CN.GB18030 locale and works correctly if it is run C locale


                                    

Comments
EVALUATION

Investigating cause of this
###@###.### 2003-11-14

Reproduced this issue and narrowed down root cause.

A side effect of fix for bugID 4685305 (Esc ID 544539). "Charset.{forName()}{.isSupported()}
is not thread safe" is that where the default charset for a platform locale is one of the
extended charsets provided by sun.nio.cs.ext.ExtendedCharsets then charset caching
and setup will cache and use the older sun.io converters rather than the newer sun.nio.cs.ext
ones. 4685305/544539 was integrated into 1.4.1_03 and between then and recent builds of
1.5 for the locale zh_CN.GB18030 the charset utilized for charset conversions is the
older sun.io converter. There is a latent bug within the sun.io.CharToByteGB18030
converter

A fix for this behaviour is provided within bugID 4838512 (Esc ID 548688) which is
due to be patched into 1.4.1_07 and 1.4.2_04 updates of 1.4.1 and 1.4.2 respectively.
4838512 ensures that default platform charsets are hardwired avoiding the caching
behaviour which previously forced the JRE to let sun.io converters take precedence
to sun.nio.cs (tested and continuously maintained) converters when running in 
the locale in which the encoding is default.
###@###.### 2003-11-17

Given the fact that we have moved on to nio completely in mustang the offending
sun.io.CharToByteGB18030 is no long in our binary, I am closing this bug.
###@###.### 2005-07-21 00:03:40 GMT
                                     
2005-07-21
WORK AROUND

run with locale such as one of the UTF-8 CN locales as opposed to zh_CN.GB18030
Problem will only occur if run in the zh_CN.GB18030 locale. Other Chinese locales
not affected.
###@###.### 2003-11-18
                                     
2003-11-18



Hardware and Software, Engineered to Work Together