JDK-6522199 : Can't open Windows-1252 accented files on Linux
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.io
  • Affected Version: 5.0u10
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • OS: linux
  • CPU: x86
  • Submitted: 2007-02-07
  • Updated: 2011-02-16
  • Resolved: 2009-02-16
Related Reports
Duplicate :  
Relates :  
Description
FULL PRODUCT VERSION :
java version "1.5.0_10"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_10-b03)
Java HotSpot(TM) Server VM (build 1.5.0_10-b03, mixed mode)

ADDITIONAL OS VERSION INFORMATION :
Linux kukenam 2.6.18-3-k7 #1 SMP Mon Dec 4 17:23:11 UTC 2006 i686 GNU/Linux

EXTRA RELEVANT SYSTEM CONFIGURATION :
The filesystem type is Ext3

A DESCRIPTION OF THE PROBLEM :
A bunch of files on a directory, some with English names, other with French accented names and some other with Spanish accented names. Some of the files have been created on Windows (thus named using windows-1252) and other on Linux (named using UTF-8). The directory ��/tests/�� is on a Debian testing GNU/Linux Machine using UTF-8 as default enconding method.

The immediate objective. (actually not a functional one... yet)
List the properties of each file on the directory.

The problem.
Even creating a java.io.File object ��pointing�� to the containing directory and then calling

File[ ] listFiles( )

to get a File object for each file on the directory, the object which are suppose to point to files with accented names on windows-1252 don't reach the actual file, or at least the methods like isFile() and canRead() return false.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Put a file with invalid UTF-8 characters on your file system and try to create a File object pointing to it. To create a file with invalid characters you can, in example, copy accented files from a cd that has been recorded on Windows.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Have a valid File object which actually refers to the file.
ACTUAL -
A File object with a string with the invalid UTF-8 character that does not ��see�� or ��reach�� the file. As a result you are unable to manipulate the said file.

ERROR MESSAGES/STACK TRACES THAT OCCUR :
(Fileobj.toString) = /home/users/tests/Cd1 - 11 - Revel - Bol?ro.mp3

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
package local.tests;

import java.io.File;
import java.io.IOException;


public class Test1 {
	public static void main (String[] args) {
		System.setProperty("file.encoding", "windows-1252");
		FileTest handler = new FileTest("../../tests");
		try {
			handler.process();
		}
		catch (Exception e) {
			System.out.println("Oooops Exception: "+e.getLocalizedMessage());
		}
	} // End of main
}

class FileTest {
	private final File fileHandler;
	
	public FileTest() {
		this("");
	}
	
	public FileTest(String filename) {
		fileHandler = new File(filename);
	}
	
	public FileTest(File filehandle) {
		fileHandler = filehandle;
	}
	
	public void process() throws SecurityException,IOException,Exception {
		System.out.println("\n����");
		if(fileHandler.isDirectory()) {
			System.out.println(fileHandler.getName() + ": is a Directory");
			processAsDir();
		} else if(fileHandler.isFile()) {
			System.out.println(fileHandler.getName() + ": is a File");
			processAsFile();
		} else {
			throw new Exception("Dude you gotta pass something valid... \n...and "+fileHandler.getCanonicalPath()+" is not valid.");
		}
	}
	
	private void processAsFile() throws SecurityException,IOException {
		System.out.println("Name: " + fileHandler.getName());
		System.out.println("Parent: " + fileHandler.getParent());
		System.out.println("Path: "+ fileHandler.getPath());
		System.out.println("Absolute Path: "+ fileHandler.getAbsolutePath());
		System.out.println("Canonical path: "+ fileHandler.getCanonicalPath());
		System.out.println(fileHandler.canRead()?"Readable":"Not Readable");
	}
	
	private void processAsDir() throws SecurityException,IOException {
		if(fileHandler.canRead()) {
			if(fileHandler.listFiles().length==0)
				System.out.println("(Empty)");
			else
				for(File fileInDir : fileHandler.listFiles()) {
					FileTest auxhandler = new FileTest(fileInDir);
					try {
						auxhandler.process();
					}
					catch (Exception e) {
						System.out.println("Oooops Exception: "+e.getLocalizedMessage());
					}
				}
		} else
			throw new SecurityException(fileHandler.getName()+" : " + fileHandler.getCanonicalPath()+ " : directory is not readable");
	}
}
---------- END SOURCE ----------

Comments
EVALUATION java.io.File uses a String to represent the path so on non-Unicode platforms a method such as listFiles doesn't preserve the platform representation (bytes in this case). More information and discussion on this topic can be found in 4899439. One comment on the test is that it sets the file.encoding property. This is a read-only property and should not be changed. In JSR-203/NIO.2 there is a new class that will preserves the platform representation so in the example where you iterate over a directory then you can access the entries (assuming you have permission, etc.).
07-02-2007