JDK-4185525 : java.io: Cannot create files with full Unicode names (Win32/NT)
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.io
  • Affected Version: 1.2.0,1.2.1,1.2.2,1.3.0,1.3.1
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic,windows_nt,windows_2000
  • CPU: generic,x86
  • Submitted: 1998-10-29
  • Updated: 2002-11-08
  • Resolved: 2001-07-27
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other Other
1.3.1_07 07Fixed 1.4.0Fixed
Related Reports
Duplicate :  
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Description

Name: mr33420			Date: 10/29/98


Under Windows NT4.0 (sp3) (cp1252) with an NTFS file system I cannot create nor
correctly list files containing Unicode characters. NTFS
supports Unicode file names. The following code demonstrates the problem:

================= cut here ================= 

import java.io.*;

public class test {

    public static void main(String[] args) {
        String fname = "hi\u3040\u3041mom"; // a file name with unicode chars
        File dir = new File(args[0]);
        File f = new File(dir, fname);
    	listFiles(dir);
    	try {
    	    System.out.println("trying to create file...");
        	FileOutputStream out = new FileOutputStream(f);
        	out.close();
        	System.out.println("done.");
    	} catch (IOException e) {
    	    System.out.println("ERROR: " + e);
    	}
    	listFiles(dir);
    }

    public static void listFiles(File dir) {
        System.out.println("Contents of: " + dir.getAbsolutePath());
        String[] list = dir.list();
        for(int i = 0; i < list.length; i++) {
        	String name = list[i];
        	System.out.print(name + " :");
            for(int j = 0; j < name.length(); j++) {
                System.out.print(" 0x" + Integer.toHexString(name.charAt(j)));
            }
            System.out.println("");
        }
        System.out.println("");
    }
}

================= cut here ================= 

If I run the program with JDK1.2 RC1 in an empty directory I get:

Contents of: c:\unitest

trying to create file...
ERROR: java.io.FileNotFoundException: c:\unitest\hi??mom (The filename, director
y name, or volume label syntax is incorrect)
Contents of: c:\unitest

To show that I can in fact create files with Unicode characters I can run
this program with Microsoft's VM (jview) and get the following output:

Contents of: c:\unitest

trying to create file...
done.
Contents of: c:\unitest
hi??mom : 0x68 0x69 0x3040 0x3041 0x6d 0x6f 0x6d

Note that the file has been created with the correct Unicode chars.

Now run the program again with JDK1.2 RC1 VM and note that listing the
files in the directory looses information about the correct Unicode
characters in the file name:

Contents of: c:\unitest
hi??mom : 0x68 0x69 0x3f 0x3f 0x6d 0x6f 0x6d

trying to create file...
ERROR: java.io.FileNotFoundException: c:\unitest\hi??mom (The filename, director
y name, or volume label syntax is incorrect)
Contents of: c:\unitest
hi??mom : 0x68 0x69 0x3f 0x3f 0x6d 0x6f 0x6d
(Review ID: 41554)
======================================================================

Name: krT82822			Date: 06/04/99


Using the following code to list my test directory which contains Chinese, Japanese, and Korean characters in the file name. Always failed to list some or both while with Chinese/Japanese/Korean environment. My software environment is WINNT 4.0(build 1381) with SP4 plus RichWin97 build 3330 version 4.0.231298. 

The JavaCode is:
import java.io.*;
import java.util.*;

public class test { 
  public static void main(String[] args) {
    System.out.println("Locale is: " + Locale.getDefault());
    for (int i=0; i<args.length; i++) {
      File f = new File(args[i]);
      proc(f);
    }
    System.out.println("End.");
  }
	
  public static void proc(File f) { 
    String lst[] = f.list();
    if (lst == null || lst.length == 0) {
      System.out.print(f);
      System.out.println("--->Empty");
      return ;
    } else {
      //System.out.println(":");
      }
    for(int i=0; i<lst.length ; i++) {
      //System.out.println(lst[i]);
      File g = new File(f, lst[i]);
      System.out.print (g);
      if (g.isDirectory()) {
	System.out.println("-->Directory.");
	proc(g);
      } else if (g.isFile()) {
	System.out.print("-->File");
	if (g.exists()) {
	  System.out.println("-->Exists.");
	} else {
	  System.out.println("-->Failed!");
	}
      } else {
	System.out.println("-->XXXXXXX.");
      }
    }
  }
}


The result without RichWin is:
Locale is: en_US
d:\test\j\jse206007-->Directory.
d:\test\j\jse206007\chinese-->Directory.
d:\test\j\jse206007\chinese\321??.doc-->XXXXXXX.
d:\test\j\jse206007\japan-->Directory.
d:\test\j\jse206007\japan\???????????.obd-->XXXXXXX.
d:\test\j\jse206007\korea-->Directory.
d:\test\j\jse206007\korea\SW??.XLS-->XXXXXXX.
d:\test\j\jse206007\korea\?????.com-->XXXXXXX.
d:\test\j\jse206007\macrotrp-->Directory.
d:\test\j\jse206007\macrotrp\2000??.XLS-->XXXXXXX.
d:\test\j\jse206007\specase-oemchar-->Directory.
d:\test\j\jse206007\specase-oemchar\??-->XXXXXXX.
1.2.10--->Empty
1.2.11--->Empty
1.2.12--->Empty
End.

With RichWin (GB)
Locale is: zh_CN
d:\test\j\jse206007-->Directory.
d:\test\j\jse206007\chinese-->Directory.
d:\test\j\jse206007\chinese\321&#65396;??&#12539;doc-->File-->Exists.
d:\test\j\jse206007\japan-->Directory.
d:\test\j\jse206007\japan\???????????.obd-->XXXXXXX.
d:\test\j\jse206007\korea-->Directory.
d:\test\j\jse206007\korea\SW?T&#65428;&#12539;XLS-->File-->Exists.
d:\test\j\jse206007\korea\?????.com-->XXXXXXX.
d:\test\j\jse206007\macrotrp-->Directory.
d:\test\j\jse206007\macrotrp\2000&#65412;&#12539;&#12539;XLS-->File-->Exists.
d:\test\j\jse206007\specase-oemchar-->Directory.
d:\test\j\jse206007\specase-oemchar\?e&#65394;E-->File-->Exists.
1.2.10--->Empty
1.2.11--->Empty
1.2.12--->Empty
End.

with RichWin Japanese (SHIFT JIS)
Locale is: ja_JP
d:\test\j\jse206007-->Directory.
d:\test\j\jse206007\chinese-->Directory.
d:\test\j\jse206007\chinese\321?j??.doc-->File-->Exists.
d:\test\j\jse206007\japan-->Directory.
d:\test\j\jse206007\japan\&#65393;&#65395;&#65396;&#65428;&#65397;&#65428;&#65428;&#65429;&#65429;&#65429;&#65393;.obd-->XXXXXXX.
d:\test\j\jse206007\korea-->Directory.
d:\test\j\jse206007\korea\SW????.XLS-->File-->Exists.
d:\test\j\jse206007\korea\?????.com-->XXXXXXX.
d:\test\j\jse206007\macrotrp-->Directory.
d:\test\j\jse206007\macrotrp\2000?~??.XLS-->File-->Exists.
d:\test\j\jse206007\specase-oemchar-->Directory.
d:\test\j\jse206007\specase-oemchar\??-->XXXXXXX.
1.2.10--->Empty
1.2.11--->Empty
1.2.12--->Empty
End.

with Rich Win Korean (KSC)
Locale is: ko_KR
d:\test\j\jse206007-->Directory.
d:\test\j\jse206007\chinese-->Directory.
d:\test\j\jse206007\chinese\321?j??.doc-->File-->Exists.
d:\test\j\jse206007\japan-->Directory.
d:\test\j\jse206007\japan\???????????.obd-->XXXXXXX.
d:\test\j\jse206007\korea-->Directory.
d:\test\j\jse206007\korea\SW????.XLS-->File-->Exists.
d:\test\j\jse206007\korea\&#12596;&#12609;&#12601;&#47308;&#12599;.com-->File-->Exists.
d:\test\j\jse206007\macrotrp-->Directory.
d:\test\j\jse206007\macrotrp\2000?~??.XLS-->File-->Exists.
d:\test\j\jse206007\specase-oemchar-->Directory.
d:\test\j\jse206007\specase-oemchar\??-->XXXXXXX.
1.2.10--->Empty
1.2.11--->Empty
1.2.12--->Empty
End.

with RichWin Chinese (Big5 JT)
Locale is: zh_CN
d:\test\j\jse206007-->Directory.
d:\test\j\jse206007\chinese-->Directory.
d:\test\j\jse206007\chinese\321&#45845;?.doc-->File-->Exists.
d:\test\j\jse206007\japan-->Directory.
d:\test\j\jse206007\japan\???????????.obd-->XXXXXXX.
d:\test\j\jse206007\korea-->Directory.
d:\test\j\jse206007\korea\SW&#46369;??.XLS-->File-->Exists.
d:\test\j\jse206007\korea\?????.com-->XXXXXXX.
d:\test\j\jse206007\macrotrp-->Directory.
d:\test\j\jse206007\macrotrp\2000&#53160;&#44618;.XLS-->File-->Exists.
d:\test\j\jse206007\specase-oemchar-->Directory.
d:\test\j\jse206007\specase-oemchar\?{&#53107;-->File-->Exists.
1.2.10--->Empty
1.2.11--->Empty
1.2.12--->Empty
End.

and RichWin UTF8
Locale is: en_US
d:\test\j\jse206007-->Directory.
d:\test\j\jse206007\chinese-->Directory.
d:\test\j\jse206007\chinese\321???.doc-->File-->Exists.
d:\test\j\jse206007\japan-->Directory.
d:\test\j\jse206007\japan\???????????.obd-->XXXXXXX.
d:\test\j\jse206007\korea-->Directory.
d:\test\j\jse206007\korea\SW?D??.XLS-->File-->Exists.
d:\test\j\jse206007\korea\?????.com-->XXXXXXX.
d:\test\j\jse206007\macrotrp-->Directory.
d:\test\j\jse206007\macrotrp\2000????.XLS-->File-->Exists.
d:\test\j\jse206007\specase-oemchar-->Directory.
d:\test\j\jse206007\specase-oemchar\???E-->File-->Exists.
1.2.10--->Empty
1.2.11--->Empty
1.2.12--->Empty
End.
(Review ID: 83922)
======================================================================

Name: krT82822			Date: 02/13/2000


java version "1.2.1"
Classic VM (build JDK-1.2.1-A, native threads)

On Win2000, whenever I try to read files that have Arabic characters in their
name, I get an exception stating that the file is not found. I traced the code,
and found that the Arabic characters in the file name are replaced by '?' (hex
3F). This changes the file name to a wrong string which does not refer to the
concerned file. It wasn't the case when I used JDK1.2.1 on WinNT or Win95. The
same code runs properly on those platforms. I traced into the JDK code itself
and read some comments stating that the file name is returned by the native
system, i.e. the conversion of the characters to '?' is done by the native
code, not by the java classes. Just for reference, I don't have any problem in
reading the Arabic text contained in the files, the problem is only related to
the file name.

  public static void main(String args[]) {
    File path = new File(testFile.getParent());
    String fileName;
    String fileList[];
    try {
        fileList = path.list();
        fileName = fileList[0];// if the file name contains Arabic characters
                               // these characters will be replaced by '?'
    }
    catch(Exception e) {
        System.out.println("Exception in main() " + e);
    }
  }
(Review ID: 101181)
======================================================================

Name: yyT116575			Date: 01/22/2001


java version "1.3.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.0-C)
Java HotSpot(TM) Client VM (build 1.3.0-C, mixed mode)

I'm trying to pass Unicode string arguments (file names) to my Java application from a native Win32 app. The native app is strictly Unicode (using wWinMain and exlusively _TCHAR and L"..." string literals). I'm calling Java like this:

    // testing with U+FB56 ARABIC LETTER PEH ISOLATED
    wstring params = L"Here comes a Unicode char: \xfb\x56";
    ShellExecute(0, 0, L"javaw.exe", params.c_str(), 0, SW_SHOWNORMAL);

My Java main method sees the arabic Peh as a "?" (ASCII 63, hex 3f). Some
Unicode chars actually work, though. I tested with U+20AC (euro symbol), and it worked fine.
(Review ID: 115565)
======================================================================

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: 1.3.1_07 merlin-rc1 FIXED IN: 1.3.1_07 merlin-rc1 INTEGRATED IN: 1.3.1_07 merlin-rc1
14-06-2004

WORK AROUND Name: krT82822 Date: 02/13/2000 avoid using Arabic file names, or avoid using Windows 2000 (Review ID: 101181) ======================================================================
11-06-2004

PUBLIC COMMENTS Now on Windows NT/2000 one can create/read/delete etc. files with any Unicode names. One remaining limitation is that java.util.zip package does not yet support Unicode-aware file names. We will address this issue separately. Another limitation is that VM and java launcher do not yet support Unicode-aware file names, so, for instance, one can not have full Unicode characters in classpath. This issue will also be addressed separately.
10-06-2004

EVALUATION Fixing this requires splitting the Win32 filesystem code into NTFS and non-NTFS components. This should be straightforward. -- mr@eng 2001/3/29 I'm updating here because of a customer escalation on this bug. Customer reported that this problem appears again. They supplied a slightly different test case this time (attached FileTest.java). I verified that this is a test case problem. Customer test case iterates from 0 to 0xFFFF to try out all possible unicode characters in file name. Some of the cases fail in the loop. But this is because some characters are reserved as meta characters in WinNT (2k) file system, such as <, >, /, \, : etc. Please refer to attached JavaOutput.txt,test.cpp and CppOutput.txt. The renameTo call in Java program fails exactly in the same cases for which WinNT C++ program fails. Bugtraq does not allow me to update this bug with the error message of invalid user id. Hence I am blanking out RE field. Original RE field was 'kladko'. ###@###.### 2002-08-29
29-08-2002