United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6196407 J2SE NIO: eucJP-open failed to be looked up.
JDK-6196407 : J2SE NIO: eucJP-open failed to be looked up.

Details
Type:
Bug
Submit Date:
2004-11-17
Status:
Resolved
Updated Date:
2010-05-10
Project Name:
JDK
Resolved Date:
2004-12-10
Component:
core-libs
OS:
solaris_9
Sub-Component:
java.lang
CPU:
x86,sparc
Priority:
P2
Resolution:
Fixed
Affected Versions:
1.4.2,1.4.2_06
Fixed Versions:
1.4.2_08 (b01)

Related Reports
Duplicate:
Relates:
Relates:

Sub Tasks

Description
The certain order of calling java.lang.String.getBytes(String) will result
in a seeming failure(taking time and system call error)  of 
java.nio.charset.Charset.loolupViaProviders(String).

For example:
    String str = "abc";
    str.getBytes("eucJP-open");
    str.getBytes("MS932");
    str.getBytes("eucJP-open");
    str.getBytes("MS932");


The calls of getBytes() will result in loolupViaProviders(String). But
it seems to almost fail. Because it was taking time and resulted in
error in stat64 system call at OS level. 

###@###.### 2004-11-17 02:44:22 GMT

                                    

Comments
EVALUATION

I am not familiar with the charset code but I run the testcase and go through the source code of this part. Seems to me that this is a right behavior. 
Every time the getBytes run lookupViaProvider() to find the charset but failed every time because eucJP-open charset is not defined in our default charset names. Then the caching is not functioning.  If you change the charset name to euc_jp, the problem will go away.

My understanding is that the customer want to provider their own implementation to use eucJP-open. I add ###@###.### in the interest list. I think he is the original developer for charset provider.


###@###.### 2004-11-19 02:51:56 GMT

While this is the "correct" behavior of Provider lookup mechanism, it's not 
an acceptable performance for the j2se product...

The root cause is we don't have eucjp-open and pck in nio charset 
collection in 1.4.2_xx, there are ctb/btc (We have them now in 1.5,
see bug#4892738). Current implementation of StringCoding class caches 
only one "last used" charset/converter per thread, so using two encoding 
names repeatly easily "penetrates" this cache mechanism, and the worse 
is that the eucjp-open does not exist in any of the StandardCharset
Provider or the ExtendedCharsetProvider, so next levels of cache in
Charset and AbstractCharsetprovider also do not help, we endup of
reaching the final lookup layer to lookup for "new" charset provider again 
and again, which is expensive. The reason we don't see the same issue with 
PCK is that we have a special "if PCK" code in StringCoding.java. We have
the same problem with all encodings that only exist in sun.io package, 
such as those IBMxyz encodings.

2 possible quick/easy solutions for this particular issue would be
(1)Add the same "special" code for eucjp-open in StringCodeing.lookupCharset()
or
(2)backport 4892738 to 1.4.2_0x, which I think is the better approach.

The disadvantage of above 2 solutions is we still have the same issue with
those IBMxyz encodings, if customer care them...

Since this issue does not exist in 1.5 and later. Submitter needs escalate to
CTE to get this one fixed in 1.4.2_08 release.

###@###.### 2004-11-19 05:07:16 GMT
                                     
2004-11-19



Hardware and Software, Engineered to Work Together