United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-4820807 : java.util.zip.ZipInputStream cannot extract files with Chinese chars in name

Details
Type:
Enhancement
Submit Date:
2003-02-19
Status:
Resolved
Updated Date:
2009-04-25
Project Name:
JDK
Resolved Date:
2009-04-25
Component:
core-libs
OS:
solaris_7,windows_2000
Sub-Component:
java.util.jar
CPU:
x86,sparc
Priority:
P4
Resolution:
Fixed
Affected Versions:
1.2.1,1.4.1,5.0
Fixed Versions:

Related Reports
Relates:
Relates:

Sub Tasks

Description

Name: nt126004			Date: 02/19/2003


FULL PRODUCT VERSION :
java version "1.4.1"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-b21)
Java HotSpot(TM) Client VM (build 1.4.1-b21, mixed mode)


FULL OPERATING SYSTEM VERSION :
Microsoft Windows 2000 [Version 5.00.2195]
Service Pack 3

A DESCRIPTION OF THE PROBLEM :
If ZipInputStream is used to read a zip file containing one
or more files with Chinese, Japanese or Korean names, the
getNextEntry method throws an IllegalArgumentException.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Create a zip file containing at least one file with a
Chinese, Japanese or Korean filename.
2. Try to read using a ZipInputStream.

EXPECTED VERSUS ACTUAL BEHAVIOR :
Should return a valid entry with the correct filename
instead of throwing an exception.

ERROR MESSAGES/STACK TRACES THAT OCCUR :
java.lang.IllegalArgumentException

    at java.util.zip.ZipInputStream.getUTF8String(ZipInputStream.java:291)

    at java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:230)

    at java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:75)

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.io.FileInputStream;
import java.io.IOException;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;

public final class TestCase {
    public static void main(String[] args) throws IOException {
        ZipInputStream zis = new ZipInputStream(new FileInputStream
("myfile.zip"));
        ZipEntry entry;
        while ((entry = zis.getNextEntry()) != null) {
            System.out.println("found " + entry.getName());
        }
    }
}

---------- END SOURCE ----------

CUSTOMER WORKAROUND :
Do not use CJK filenames in zip files.
(Review ID: 181382) 
======================================================================


###@###.### 2003-09-02

Same problem reported by a CAP member from Germany:

J2SE Version (please include all output from java -version flag):
  java version "1.4.1"
  Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-b21)
  Java HotSpot(TM) Client VM (build 1.4.1-b21, mixed mode)

and

  java version "1.5.0-beta"
  Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-beta-b16)
  Java HotSpot(TM) Client VM (build 1.5.0-beta-b16, mixed mode)


Does this problem occur on J2SE 1.3, 1.4 or 1.4.1?  Yes / No (pick one)
  Yes

Operating System Configuration Information (be specific):
  English Linux and German Win2K

Bug Description:
  A ZIP file with entries that contain german umlauts. When read
  read these entries using ZipInputStream.getNextEntry() it throws an 
  IllegalArgumentException at:

Exception in thread "main" java.lang.IllegalArgumentException
         at 
java.util.zip.ZipInputStream.getUTF8String(ZipInputStream.java:298)
         at java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:237)
         at 
java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:73)
         at ZipUmlauts.main(ZipUmlauts.java:22)

  It would be better, if the getUTF8String() method would just ignore 
  these "illegal" characters or add them "as-is".

Test Program: (ZipUmlauts.java umlauts.zip)
-------------

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;

/*
 *  ZipUmlauts.java created on Sep 1, 2003 8:45:08 AM
 */

/**
 * @version ${Id:}
 * @author rs
 * @since pirobase®CB 1.0
 */
public final class ZipUmlauts {

    public static void main(String[] args) throws IOException {
        FileInputStream fis=new FileInputStream("umlauts.zip");
        ZipInputStream zis=new ZipInputStream(fis);
        ZipEntry ze;
        while ((ze=zis.getNextEntry())!=null) {
            System.out.println(ze.getName());
        }
    }

}

                                    

Comments
EVALUATION

Unfortunately, fixing this in a backward-compatible way may be impossible.
At least, for non-ASCII file names, Java should be able to create files
on one system and extract them on a different system, even if the
encodings are different.

The suggestion of adding an encoding attribute is a good one.
That should have been done when the decision to encode file names
in UTF-8 was first made.
###@###.### 2003-09-04

I have confirmed that, as long as one uses Sun's J2SE zip
implementation consistently, in a environment where file.encoding
supports the character set of interest,
that one can create, list and extract
jar/zip files containing non-ASCII characters (including Chinese
characters) correctly.   Other zip implementations also have
character encoding interoperability problems, so J2SE's
implementation is not alone.

The suggestion of falling back to file.encoding is an appealing one,
but it's quite dangerous to go down that route.

Encoding "autodetection" is a good interactive feature for users, but
it's not so good for file formats.  To have a file be properly readable
depending fairly randomly on the data bit patterns stored within it
is a reliability disaster.  It's much better to have consistent failure
than intermittent "success".

Re-architecting zip to record the encoding of the file names will
hopefully get done for J2SE 1.6.

###@###.### 2003-11-25


I believe this is a duplicate of 4244499. See the evaluation of that bug report for a relatively simple proposed solution.
###@###.### 2005-1-29 00:28:38 GMT
                                     
2003-11-25
EVALUATION

ZipInputStream(InputStrea, Charset) has been introduced in jdk7 to solvoe this issue.
Try 
    ZipInputStream zis=new ZipInputStream(fis, Charset.forName("ibm437"));
For the umlauts case.

Try use "gbk" for the chinesefilenameInside.zip
                                     
2009-04-16



Hardware and Software, Engineered to Work Together