JDK-1266364 : Unicode escape in package name isn't processed properly
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 1.0.2,1.1.6,1.2.0,1.2.1
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • OS: generic,solaris_2.6,windows_nt
  • CPU: generic,x86,sparc
  • Submitted: 1996-09-03
  • Updated: 2001-03-14
  • Resolved: 2001-03-14
Related Reports
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Description

Name: ###@###.###			Date: 09/03/96


Java language specification (#7.2.1 Storing Packages in a File System) says:

"A package name component or class name might contain a character that
cannot correctly appear in a host file system's ordinary directory
name, such as a Unicode character on a system that allows only ASCII
characters in file names. As a convention, the character can be escaped
by using, say, the @ character followed by four hexadecimal digits
giving the numeric value of the character, as in the \\uxxxx escape
(3.3), so that the package name:

children.activities.crafts.papierM\\u00e2ch\\u00e9

which can also be written using full Unicode as: 

children.activities.crafts.papierM��ch��

might be mapped to the directory name: 

children/activities/crafts/papierM@00e2ch@00e9

If the @ character is not a valid character in a file name for some
given host file system, then some other character that is not valid in
a Java identifier could be used instead."

The example below is compiled successfully and pkgs008.class is placed to the directory 

	javasoft/sqe/tests/lang/pkgs007/pkgs00701��

while fully qualified class name in pkgs00701.class is 

	javasoft/sqe/tests/lang/pkgs007/pkgs00701����/pkgs00701

So the compiled class cannot be found for execution:

>javac -d . pkgs00701.java
> java javasoft.sqe.tests.lang.pkgs007.pkgs00701��.pkgs00701
Can't find class javasoft.sqe.tests.lang.pkgs007.pkgs00701��.pkgs00701
>

-----------------------pkgs00701.java---------------------------
//File: @(#)pkgs00701.java 1.3 96/08/20 
//Copyright 08/20/96 Sun Microsystems, Inc.  All Rights Reserved
 
package javasoft.sqe.tests.lang.pkgs007.pkgs00701\\u00e9;
 
import java.io.PrintStream;

public class pkgs00701 { 
  public static void main(String argv[]) {
     System.exit(run(argv,System.out));
  }
  public static int run(String argv[],PrintStream out) {
     int r = 6;
     if ( r != 6 ) {
        out.println("not pass");
        return 2;
     }
     return 0;
  }
}
-------------------------------------------------------------

======================================================================

The following example also fails:

//File: @(#)pkgs00702.java 1.5 96/11/15 
//Copyright 11/15/96 Sun Microsystems, Inc.  All Rights Reserved
 
package javasoft.sqe.tests.lang.pkgs007.pkgs00702;
 
import java.io.PrintStream;

public class pkgs00702 { 
  public static void main(String argv[]) {
     System.exit(run(argv, System.out) + 95/*STATUS_TEMP*/);
  }
  public static int run(String argv[],PrintStream out) {
     try {
       Class.forName ("javasoft.sqe.tests.lang.pkgs007.pkgs00702\u0099.pkgs00702a");
     }
     catch (Exception e) {
       out.println (e);
       out.println ("failed");
       return 2/*STATUS_FAILED*/;
     }
     return 0/*STATUS_PASSED*/;
  }
}

//File: @(#)pkgs00702a.java 1.1 96/11/15 
//Copyright 11/15/96 Sun Microsystems, Inc.  All Rights Reserved
 
package javasoft.sqe.tests.lang.pkgs007.pkgs00702\u0099;
 
import java.io.PrintStream;

public class pkgs00702a { 
  static int r = 4; 
  public static int get() {
     return r;
  }
}

william.maddox@Eng 2000-01-07

Comments
PUBLIC COMMENTS .
10-06-2004

EVALUATION Getting the name mangling correct needs to be resolved as part of the general problem of compiling into jar files. david.stoutamire@Eng 1997-12-05 There is nothing wrong with the compiler. The fully-qualified name appears to be corrupted only because it is encoded, per spec, in UTF-8. The two bytes corresponding to the UTF-8 encoding of the two-byte Unicode character \u00e9 happen to print as described when interpreted as two distinct single-byte Latin-1 characters. The error message that appears when running the program indicates that the class file itself cannot be located by the VM. But the directory name is in fact correct, and was never in question -- it was the *internal* copy of the fully-qualified class name that was alledgedly corrupt. I was able to reproduce this behavior by using a shell which was apparently not 7-bit clean (tcsh) which truncated the package name containing the \u00e9 character. With the standard shell 'sh', the example worked correctly. (JDK 1.2.2) william.maddox@Eng 2000-01-05 Since the above was written, it was brought to our attention that pkgs00702 also fails (code has been attached to the description). In this case, the class is invoked reflectively from within Java, showing that the shell is not involved. Again, however, correctly-named directories are created in the host filesystem, and the apparent discrepancy with the fully-qualified pathname stored in the classfile is accounted for by the UTF-8 encoding used there. I have also verified that the VM looks for the correct path, by examining the -verbose output. (This is tricky -- apparently, the \u0099 character prints as a space in an xterm. I ran a shell within Emacs, which displays an escape sequence revealing the actual character code value.) In general, Unicode characters in the range 0x00-0xff *will* be handled correctly by the compiler without the need to use any '@' escapes, at least on Solaris where I ran the tests. It remains the case that the compiler currently assumes that any character that appears within a Java identifier is also legal as a character in a filename on the underlying platform. While this is not correct, neither pkgs00701 nor pkg00702 demonstrate that flaw. Instead, the compiler is generating correctly named class files in correctly named directories containing correct contents, and the VM is somehow not finding them. I conjecture that the VM may not be accounting for the UTF-8 encoding of the class name within the class file, an oversight that would go unnoticed when only ASCII characters are used. I have reclassified this as java/runtime on the grounds that pkgs00701 and pkgs00702 both appear to processed correctly by the compiler, but fail at runtime nonetheless. As a separate issue, the compiler *does* need to use '@' escapes in some cases, for example, on US Solaris (single-byte) when the character is greater than 0xff. The VM must also agree on a regarding their use, and this must be coordinated across all platforms. AFAIK, there is currently no clean way to determine which characters are legal in filenames on a given platform, which means we would either have to make the compiler platform-dependent, or adopt a least-common denominator approach: use an escape for anything outside of 7-bit ASCII. Observed behavior on Solaris (single-byte US English) is that the upper 8 bits of the characters composing a string filename argument to the constructor FileOutputStream are simply discarded silently. william.maddox@Eng 2000-01-07
07-01-2000