United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-4838512 (cs) Default charsets must be hardwired
JDK-4838512 : (cs) Default charsets must be hardwired

Details
Type:
Bug
Submit Date:
2003-03-27
Status:
Resolved
Updated Date:
2004-04-07
Project Name:
JDK
Resolved Date:
2003-10-24
Component:
core-libs
OS:
solaris,solaris_8,linux
Sub-Component:
java.nio
CPU:
x86,sparc,generic
Priority:
P3
Resolution:
Fixed
Affected Versions:
1.4.1,1.4.1_03,1.4.2,1.4.2_04
Fixed Versions:
1.4.1_07 (07)

Related Reports
Backport:
Backport:
Duplicate:
Duplicate:
Duplicate:
Relates:
Relates:

Sub Tasks

Description
PROBLEM:
 On Multi-processor environment (linux), the attached sample code(A.java)
 outputs the attached exception strace.


TEST PROGRAM:
==== A.java ===
public class A {
    public static void main(String arg[]) throws Exception {
	Thread t1 = new Test();
	Thread t2 = new Test();

	t1.start();
	t2.start();
    }

    static class Test extends Thread {
	public void run() {
	    while (!interrupted()) {
		try {
		    "a".getBytes("ASCII");
		    "a".getBytes("EUC-JP-LINUX");
		} catch (Exception e) {
		    e.printStackTrace();
		}
	    }
	}
    }
}    
===============

LOG DATA:
==== log ===
java.lang.Error: java.nio.charset.UnsupportedCharsetException: EUC-JP-LINUX
        at java.lang.StringCoding.lookupCharset(StringCoding.java:84)
        at java.lang.StringCoding.encode(StringCoding.java:361)
        at java.lang.StringCoding.encode(StringCoding.java:378)
        at java.lang.String.getBytes(String.java:608)
        at java.io.UnixFileSystem.canonicalize(Native Method)
        at java.io.File.getCanonicalPath(File.java:513)
        at java.io.FilePermission$1.run(FilePermission.java:209)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.io.FilePermission.init(FilePermission.java:203)
        at java.io.FilePermission.<init>(FilePermission.java:253)
        at sun.net.www.protocol.file.FileURLConnection.getPermission(FileURLConn
ection.java:193)
        at sun.net.www.protocol.jar.JarFileFactory.getPermission(JarFileFactory.
java:111)
        at sun.net.www.protocol.jar.JarFileFactory.getCachedJarFile(JarFileFacto
ry.java:81)
        at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:50)
        at sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.ja
va:85)
        at sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnec
tion.java:105)
        at java.net.URL.openStream(URL.java:960)
        at sun.misc.Service.parse(Service.java:203)
        at sun.misc.Service.access$100(Service.java:111)
        at sun.misc.Service$LazyIterator.hasNext(Service.java:257)
        at java.nio.charset.Charset$1.getNext(Charset.java:301)
        at java.nio.charset.Charset$1.hasNext(Charset.java:316)
        at java.nio.charset.Charset$2.run(Charset.java:359)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.nio.charset.Charset.lookupViaProviders(Charset.java:356)
        at java.nio.charset.Charset.lookup(Charset.java:383)
        at java.nio.charset.Charset.isSupported(Charset.java:405)
        at java.lang.StringCoding.lookupCharset(StringCoding.java:80)
        at java.lang.StringCoding.encode(StringCoding.java:361)
        at java.lang.String.getBytes(String.java:591)
        at A$Test.run(A.java:27)
Caused by: java.nio.charset.UnsupportedCharsetException: EUC-JP-LINUX
        at java.nio.charset.Charset.forName(Charset.java:428)
        at java.lang.StringCoding.lookupCharset(StringCoding.java:82)
        ... 30 more
============


CONFIGRATION:
 
 - MPU: Pentium III 800MHz X 2
 - OS : Turbo Linux 8 (kernel 2.4.18-5smp)
 - JRE: JDK1.4.1_01, 1.4.2(b15)


REPORT:
 
They suspect this is caused when several threads tried to modify the cache
in Charset.
Specifically speaking, java.nio.Charset#isSupported and java.nio.Charset#forName
should be atomic, but they are not.

The followings are the possible senario. Thread-A, B are created in the 
test program.

Thread-A     Cache     Thread-B
--------    -------    ---------
  
              ASCII
    
   (1)          
                              (2)
             EUC-JP-LINUX
   (3)
                              (4)
               ASCII
   (5)


(1) In thread-A, "a".getBytes("EUC-JP-LINUX") runs as follows.

  "a".getBytes("EUC-JP-LINUX")
     -> StringCoding#lookupCharset
      -> Charset#isSupported("EUC-JP-LINUX")
       -> Charset#lookup("EUC-JP-LINUX")
  
  At this stage, Charset.cache is "ASCII", cache-miss occurs and
  calls Charset#lookupViaProviders.

(2) A sequence of "a".getBytes("EUC-JP-LINUX") finishes in threda-B.
    Here, Cache is set to "EUC-JP-LINUX".

(3) During the execution in (1), StringCoding#lookupCharset is called.
    (This is from the information in the above log.)
    In lookupCharset, Charset#isSupported("EUC-JP-LINUX") is called again
    and returns true because of cahce-hit (cahche is set to "EUC-JP-LINUX"
    at (2))

(4) A sequence of "a".getBytes("ASCII") finishes.
    Here, Cache is set to "ASCII".

(5) Charset#forName("EUC-JP-LINUX") is called after
    Charset#isSupported("EUC-JP-LINUX") at (3).
    Here, cache-miss occurs and Charset#lookupViaProviders is called.
    However,  lookupViaProviders is not re-entrant and returns null.
    As the result, UnsupportedCharsetException seems to happen.

===========================================================================


                                    

Comments
WORK AROUND

Set the default encoding on the command line to force the use of the old sun.io
EUC-JP-LINUX converter, e.g.,

    % java -Dfile.encoding='^AEUC-JP-LINUX' Foo

where ^A represents the ASCII character control-A, i.e., \u0001.  This causes
the old sun.io converter for EUC-JP-LINUX to be used whenever the default
encoding is required, thereby preventing the recursive provider lookups which
cause the reported problem.

Note that the system property "file.encoding" is implementation-private.  The
redefinition of this property is not, in general, guaranteed to work, and will
likely fail to work in J2SE 1.5 or later releases.

-- ###@###.### 2003/10/5
                                     
171-11-10 0
EVALUATION

The analysis given by the submitter is on the right track, but the change
required is more than a simple matter of making the Charset.isSupported and
.forName methods atomic.  That would not actually solve the reported problem.
The suggested fix would mask the problem, but would fail in a future release
when the old sun.io converters are removed.

The root cause of this bug is the fact that a platform's default charset cannot
be loaded via the charset-provider mechanism.  The default charset is used to
translate filenames from Java UTF-16 strings into platform-specific strings.
The provider mechanism itself needs to translate filenames in order to discover
providers, hence a provider cannot provide the charset which is needed to
discover and load itself.  This is why the lookup code in the Charset class
disallows recursive provider lookups.

In 1.4.1 and later releases the EUC-JP-LINUX charset is provided by the
sun.nio.cs.ext.ExtendedCharsets provider.  In contexts in which EUC-JP-LINUX is
the default charset (e.g., LC_ALL=ja_JP on Linux) it would seem that this
charset should appear to be unsupported, but in fact it works much of the time.
The reason for this is the existence in the 1.4.x releases of a dual charset
lookup mechanism which falls back to the old sun.io converters when a charset
is not supported by the java.nio.charset APIs.

To see how this works, consider the following example.  The evaluation of the
expression "a".getBytes("EUC-JP-LINUX") first causes the code in the internal
java.lang.StringCoding class to invoke the Charset.isSupported() method to see
if that charset is supported.  EUC-JP-LINUX is not a standard charset, so the
lookup code in java.nio.charset.Charset tries to look it up via the provider
mechanism.  This lookup eventually results in a recursive invocation of the
String.getBytes method on the same thread, this time to encode the filename of
the charsets.jar file into EUC-JP-LINUX (since it's the default charset), which
in turn results in a recursive provider lookup.  This fails, since such lookups
are disallowed, hence the String.getBytes method falls back to the old sun.io
EUC-JP-LINUX converter.  The initial provider lookup then succeeds, since it
uses the old converter to encode the filename.

On a multiprocessor this scheme can break down if the timing is just right.  As
observed by the submitter, the Charset class contains a global cache of the
most recently-returned charset.  At the end of the scenario described above
this cache will hold a reference to the EUC-JP-LINUX charset.  If one thread
causes the EUC-JP-LINUX charset to be removed from the cache in between another
thread's invocations of the Charset.isSupported and .forName methods during the
recursive provider lookup then an UnsupportedCharsetException will be thrown,
as reported.

The solution suggested by the submitter will solve the problem, but at the cost
of a synchronization operation and in a way that will fail when the sun.io
converters are removed in a future release.  A better solution is to recognize
this fundamental limitation of the charset-provider mechanism and "hardwire"
the ExtendedCharsets provider into the java.nio.charset.Charset lookup logic.
The diffs for this change are in the suggested-fix section of this bug report.

An alternative solution would be to rework the sun.misc.Service code so that it
does not load provider-descriptor files via URLs.  It does this only because
that's the only way to load multiple resource files of the same name.  Since
charset providers are, by definition, already on the class path, there's really
no need to do another permission check on each provider-description file as is
currently done by the clumsy JarURLConnection code.  This solution would,
however, most likely require a more complex and risky set of changes, to the
Service, JarURLConnection, and (possibly) java.lang.ClassLoader classes, hence
it is not proposed here.

-- ###@###.### 2003/10/5
                                     
171-11-10 0
SUGGESTED FIX

Please see the attached webrevs.  There are two slightly different fixes, one
for 1.4.1 and another for 1.4.2 and later.  For convenience these webrevs may
also be viewed online at http://nio.sfba/rev/4838512.

-- ###@###.### 2003/10/22
                                     
188-11-10 0
CONVERTED DATA

BugTraq+ Release Management Values

COMMIT TO FIX:
1.4.1_07
1.4.2_05
generic
tiger-beta

FIXED IN:
1.4.1_07
1.4.2_05
tiger-beta

INTEGRATED IN:
1.4.1_07
1.4.2_05
tiger-b26
tiger-beta


                                     
2004-06-14



Hardware and Software, Engineered to Work Together