Bug ID: JDK-4244499 ZipEntry() does not convert filenames from Unicode to platform

JDK-4244499 : ZipEntry() does not convert filenames from Unicode to platform

Type: Bug
Component: core-libs
Sub-Component: java.util.jar
Affected Version:
1.2.1,1.2.2,1.3.0,1.3.1,1.4.0,1.4.0_01,1.4.1,1.4.1_08,1.4.2,5.0,5.0u4,6u5 1.2.1,1.2.2,1.3.0,1.3.1,1.4.0,1.4.0_01,1.4.1,1.4.1_08,1.4.2,5.0,5.0u4,6u5

Priority: P2
Status: Resolved
Resolution: Fixed
OS:
generic,solaris_8,windows_98,windows_nt,windows_2000,windows_2003,windows_xp generic,solaris_8,windows_98,windows_nt,windows_2000,windows_2003,windows_xp
CPU: generic,x86,sparc

Submitted: 1999-06-07
Updated: 2024-11-28
Resolved: 2009-04-25

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 7
7 b57Fixed

Related Reports

CSR :	CCC-4244499 - ZipEntry() does not convert filenames from Unicode to platform
Duplicate :	JDK-4508677 - Non-asci file names in java.util.zip
Duplicate :	JDK-4415733 - util.zip does not support umlaut (mutated vowel) in file/path-names in zip file
Relates :	JDK-4532049 - IllegalArgumentException in ZipInputStream while reading unicode file
Relates :	JDK-4980042 - Cannot use Surrogates in zip file metadata like filenames
Relates :	JDK-6272251 - jarsigner crashes with NullPointerException on a filename having german umlauts
Relates :	JDK-4412571 - Implementation of Jar file does not match specification.
Relates :	JDK-6739892 - Improve handling of zip encoding through use of property flag
Relates :	JDK-4700978 - ZipFile can't treat Japanese name in a zipfile properly
Relates :	JDK-4820807 - java.util.zip.ZipInputStream cannot extract files with Chinese chars in name
Relates :	JDK-5030283 - Incorrect implementation of UTF-8 in zip package
Relates :	JDK-6245146 - Classes in java.util.zip have incorrect statement about maximum string length in some methods

Description

Name: rlT66838			Date: 06/07/99


I try to create a ZIP archive containing files, provided that the filenames are french words (ie with accentuated characters). The filenames are contained in String, this means they are encoded in Unicode. If I try to create a File from the String filename, this filename is converted OK to platform specifics; but if I create a ZipEntry from the String filename, it is NOT converted to platform specifics, leading to a filename in ZIP archive which is the Unicode image (unreadable from various ZIP tools !).

For instance:

String filename = "?l?ve.txt";

// This will create a right filename on disk
File myFile = new File(filename);
...
// A file ?l?ve.txt is created on disk

// This will create a bad (unconverted) filename in ZIP archive
ZipEntry myEntry = new Entry(filename);
...
// An entry ??l??ve.txt is created in ZIP archive

The result is that the generated ZIP entry is not usable for extraction...
(Review ID: 83688) 
======================================================================

Name: tb29552			Date: 03/24/2000


Solaris VM (build Solaris_JDK_1.2.1_04, native threads, sunwjit)
Classic VM (build JDK-1.2.2-W, green threads, sunwjit)
java version "1.1.6"

Within a ZIP file, pathnames use the forward slash / as separator, as required
by the ZIP
<A HREF="ftp://ftp.uu.net/pub/archiving/zip/doc/appnote-970311-iz.zip">spec</A>.
This requires a conversion from or to the local file.separator on systems like
Windows.  The API (ZipEntry) does not take care of the transformation, and the
need for the programmer to deal with it is not documented.  As a result, code
like
  ZipEntry ze;
  File f;
  f = new File( ze.getName());
will be written and fail on the Windows platform, or the reverse
  ze = new ZipEntry( f.getName());
will fail or produce invalid jars on Windows platforms.

Either the docs or the API needs to be fixed.  Preferably a new method and
constructor could be added
  File f = ze.toFile();     ze = new ZipEntry( f);
that would perform the translation between '/' and File.separatorChar, leaving
the existing methods/constructors (perhaps deprecated) for use by existing code.
But if the API is not fixed, then the docs must be fixed to make sure the
programmer deals with the translation explicitly.

Note new methods in java.util.zip.ZipEntry would also need to be reflected
in java.util.jar.JarEntry.
(Review ID: 100505)
======================================================================

Comments

SUGGESTED FIX SAP, as a Java SE Licensee, has provided us with a 1.4.2 solution that does not require an API change (basically, a system property). They have implemented this in their 1.4.2 based SAP JVM implementation and are providing it to us for consideration: -- From SAP -- There are problems with ZIP handling of files with non-UTF8 encoded file names. See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4244499. In order to improve the situation without changing existing APIs SAP has implemented following solution for java.util.zip.ZipInputStream into SAPJVM 5.1 and suggests that SUN should think about a similar approach for JDK 1.4.2, because we were faced with customer problems on this version: A new System Property called com.sap.jvm.ZipEntry.encoding was added with the following behavior: not set: Reading ZIP files with entries with non-UTF8 chars will fail with IllegalArgumentException as before this change, but with a useful message pointing to the cause of the problem and the new System Property "default": If decoding an entry name with UTF8 fails, try the platform's default encoding. Reading ZIP files will succeed, but filenames might be wrong <encoding>: If decoding an entry name with UTF8 fails, try the given encoding. If the right encoding is given, reading the ZIP file will succeed and entry names will be converted correctly. WinRar and WinZip seem to use "Cp437" encoding. The piece of code looks like this: Replace ZipEntry e = createZipEntry(getUTF8String(b, 0, len)); by // SAPJVM SS 2008-07-02 implemented workaround to be able to use // non-UTF8 encoded zip entry names String filename = null; try { // First try getUTF8String for compatibility filename = getUTF8String(b, 0, len); } catch (IllegalArgumentException e) { // UTF8 decoding failed! // alternative encoding requested? String encoding = System.getProperty("com.sap.jvm.ZipEntry. encoding"); if (encoding == null) { // no alternative encoding requested, just throw the // Exception (for compatibility), but add a message IllegalArgumentException ee = new IllegalArgumentException( "zip entry name contained non-utf8 chars, try system property " + "com.sap.jvm.ZipEntry.encoding"); ee.setStackTrace(e.getStackTrace()); throw ee; } // an alternative encoding is requested if (encoding.equalsIgnoreCase("default")) { // use platform's default encoding filename = new String(b, 0, len); } else { // use the specified encoding // (WinZip and WinRar seem to use Cp437 ) filename = new String(b, 0, len, encoding); } } ZipEntry e = createZipEntry(filename); --
27-08-2008
EVALUATION Contribution forum : https://jdk-collaboration.dev.java.net/servlets/ProjectForumMessageView?forumID=1463&messageID=16142
13-10-2006
EVALUATION We expect to resolve this in the Dolphin/6.0 release (though our planning for Dolphin is not complete). We anticipate a Dolphin source repository sometime this summer. Hopefully, we can get this fix into Dolphin very early, to discover any unintended consequences well before Dolphin's official release. A contributor to the JDK community has started workin on this bug (thanks!) and you can join/follow the discussion here: https://jdk-collaboration.dev.java.net/servlets/ProjectForumMessageView?messageID=13115&forumID=1463 We're considering two possibilities for the fix: one is largely that proposed by several people, namely to add constructors that allow clients to indicate a zip file's encoding. The other is to work with providers of zip implementations to provide the encoding of the entries in a file in the file itself. Discussion on the latter has been started at the above URL (see the entry "Unicode extension for ZIP file specification". Note that this bug raises two, independent issues: one concerns the character encoding for the file's entries; the other concerns the kind of path separator that is used on particular platforms. The latter has a straightforward fix (and for now, work around as noted).
13-06-2006
EVALUATION There's a lot of additional information in the JDC discussions about this bug and the duplicates 4532049, 4700978, 4415733, 4820807. The zip specification does not specify the character encoding to be used for file names (essentially, it doesn't consider file names that include non-ASCII characters). We decided that for jar files, which must be portable between different platforms and different locale environments, only UTF-8 makes sense. Therefore the code currently encodes and decodes all file names within jar/zip files using UTF-8. However, for normal (non-jar) zip files, the convention used by other tools is to use the platform encoding for file names. Applications that use the java.util.zip package to read/write normal zip files therefore fail (or produce unreadable files) if a file name contains a non-ASCII character, unless the platform encoding happens to be UTF-8. To solve this problem, I think we need to distinguish between jar and zip files, and enable the use of encodings other than UTF-8 for the file names within non-jar zip files. A possible solution would be to add a ZipFile constructor: java.util.zip.ZipFile.ZipFile(File file, int mode, String encoding) which lets an application specify the encoding for the file names and zip comments used within the zip file. Document that the encoding used for the other constructors is UTF-8, and that callers of the new constructor can pass in the result of java.nio.charset.Charset.defaultCharset().name() to request the platform encoding. This lets applications access zip files that use the encoding of the platform they run on, or even generate zip files using the encoding of the platform of the client machine that a zip files is intended for (some of the bug discussion mentions servlets creating zip files for download). The jar classes would continue to use the constructors that don't take the encoding parameter, and therefore continue to use UTF-8. The encoding of the contents of the files included in the zip files is not affected - they're just byte streams. For command line use, the jar command could be enhanced with an option that specifies the file name encoding, using either an encoding name or "default" for the platform encoding. This option should be disabled when creating jar files. ###@###.### 2005-1-28 18:42:10 GMT
28-01-2005
WORK AROUND Name: tb29552 Date: 03/24/2000 ZipEntry ze; File f; String s; s = ze.getName(); if ( File.separatorChar != '/' ) s = s.replace( '.', File.separatorChar); f = new File( s); s = f.getName(); if ( File.separatorChar != '/') s = s.replace( File.separatorChar, '/'); ze = new ZipEntry( s); (Review ID: 100505) ======================================================================
02-10-2004