JDK-4820807 : java.util.zip.ZipInputStream cannot extract files with Chinese chars in name
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.util.jar
  • Affected Version: 1.2.1,1.4.1,5.0
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: solaris_7,windows_2000
  • CPU: x86,sparc
  • Submitted: 2003-02-19
  • Updated: 2009-04-25
  • Resolved: 2009-04-25
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7
7 b57Fixed
Related Reports
Relates :  
Relates :  
Description

Name: nt126004			Date: 02/19/2003


FULL PRODUCT VERSION :
java version "1.4.1"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-b21)
Java HotSpot(TM) Client VM (build 1.4.1-b21, mixed mode)


FULL OPERATING SYSTEM VERSION :
Microsoft Windows 2000 [Version 5.00.2195]
Service Pack 3

A DESCRIPTION OF THE PROBLEM :
If ZipInputStream is used to read a zip file containing one
or more files with Chinese, Japanese or Korean names, the
getNextEntry method throws an IllegalArgumentException.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Create a zip file containing at least one file with a
Chinese, Japanese or Korean filename.
2. Try to read using a ZipInputStream.

EXPECTED VERSUS ACTUAL BEHAVIOR :
Should return a valid entry with the correct filename
instead of throwing an exception.

ERROR MESSAGES/STACK TRACES THAT OCCUR :
java.lang.IllegalArgumentException

    at java.util.zip.ZipInputStream.getUTF8String(ZipInputStream.java:291)

    at java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:230)

    at java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:75)

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.io.FileInputStream;
import java.io.IOException;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;

public final class TestCase {
    public static void main(String[] args) throws IOException {
        ZipInputStream zis = new ZipInputStream(new FileInputStream
("myfile.zip"));
        ZipEntry entry;
        while ((entry = zis.getNextEntry()) != null) {
            System.out.println("found " + entry.getName());
        }
    }
}

---------- END SOURCE ----------

CUSTOMER WORKAROUND :
Do not use CJK filenames in zip files.
(Review ID: 181382) 
======================================================================


###@###.### 2003-09-02

Same problem reported by a CAP member from Germany:

J2SE Version (please include all output from java -version flag):
  java version "1.4.1"
  Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-b21)
  Java HotSpot(TM) Client VM (build 1.4.1-b21, mixed mode)

and

  java version "1.5.0-beta"
  Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-beta-b16)
  Java HotSpot(TM) Client VM (build 1.5.0-beta-b16, mixed mode)


Does this problem occur on J2SE 1.3, 1.4 or 1.4.1?  Yes / No (pick one)
  Yes

Operating System Configuration Information (be specific):
  English Linux and German Win2K

Bug Description:
  A ZIP file with entries that contain german umlauts. When read
  read these entries using ZipInputStream.getNextEntry() it throws an 
  IllegalArgumentException at:

Exception in thread "main" java.lang.IllegalArgumentException
         at 
java.util.zip.ZipInputStream.getUTF8String(ZipInputStream.java:298)
         at java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:237)
         at 
java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:73)
         at ZipUmlauts.main(ZipUmlauts.java:22)

  It would be better, if the getUTF8String() method would just ignore 
  these "illegal" characters or add them "as-is".

Test Program: (ZipUmlauts.java umlauts.zip)
-------------

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;

/*
 *  ZipUmlauts.java created on Sep 1, 2003 8:45:08 AM
 */

/**
 * @version ${Id:}
 * @author rs
 * @since pirobase®CB 1.0
 */
public final class ZipUmlauts {

    public static void main(String[] args) throws IOException {
        FileInputStream fis=new FileInputStream("umlauts.zip");
        ZipInputStream zis=new ZipInputStream(fis);
        ZipEntry ze;
        while ((ze=zis.getNextEntry())!=null) {
            System.out.println(ze.getName());
        }
    }

}

Comments
EVALUATION ZipInputStream(InputStrea, Charset) has been introduced in jdk7 to solvoe this issue. Try ZipInputStream zis=new ZipInputStream(fis, Charset.forName("ibm437")); For the umlauts case. Try use "gbk" for the chinesefilenameInside.zip
16-04-2009

EVALUATION Unfortunately, fixing this in a backward-compatible way may be impossible. At least, for non-ASCII file names, Java should be able to create files on one system and extract them on a different system, even if the encodings are different. The suggestion of adding an encoding attribute is a good one. That should have been done when the decision to encode file names in UTF-8 was first made. ###@###.### 2003-09-04 I have confirmed that, as long as one uses Sun's J2SE zip implementation consistently, in a environment where file.encoding supports the character set of interest, that one can create, list and extract jar/zip files containing non-ASCII characters (including Chinese characters) correctly. Other zip implementations also have character encoding interoperability problems, so J2SE's implementation is not alone. The suggestion of falling back to file.encoding is an appealing one, but it's quite dangerous to go down that route. Encoding "autodetection" is a good interactive feature for users, but it's not so good for file formats. To have a file be properly readable depending fairly randomly on the data bit patterns stored within it is a reliability disaster. It's much better to have consistent failure than intermittent "success". Re-architecting zip to record the encoding of the file names will hopefully get done for J2SE 1.6. ###@###.### 2003-11-25 I believe this is a duplicate of 4244499. See the evaluation of that bug report for a relatively simple proposed solution. ###@###.### 2005-1-29 00:28:38 GMT
25-11-2003