United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-5030283 Incorrect implementation of UTF-8 in zip package
JDK-5030283 : Incorrect implementation of UTF-8 in zip package

Details
Type:
Bug
Submit Date:
2004-04-12
Status:
Resolved
Updated Date:
2009-04-25
Project Name:
JDK
Resolved Date:
2009-04-25
Component:
core-libs
OS:
generic
Sub-Component:
java.util.jar
CPU:
generic
Priority:
P3
Resolution:
Fixed
Affected Versions:
5.0
Fixed Versions:

Related Reports
Relates:
Relates:

Sub Tasks

Description
Name: nl37777			Date: 04/12/2004

Several parts of the zip package handle the UTF-8 encoding 
of entry names incorrectly. They assume either an ancient form of UTF-8 
which didn't have the 4-byte form for supplementary characters, or rely 
on the JVM's modified UTF-8, which has the same limitation. As a 
consequence, file names using supplementary characters can be used, but 
cannot be exchanged with standards-compliant zip implementations.

The following parts of the implementation are incorrect:
src/share/classes/java/util/zip/ZipInputStream.java
     getUTF8String
src/share/classes/java/util/zip/ZipOutputStream.java
     getUTF8Length
     getUTF8Bytes
src/share/native/java/util/zip/ZipEntry.c
     Java_java_util_zip_ZipEntry_initFields
src/share/native/java/util/zip/ZipFile.c
     Java_java_util_zip_ZipFile_getEntry
     Java_java_util_jar_JarFile_getMetaInfEntryNames
======================================================================

                                    

Comments
EVALUATION

Yes, we need to fix these to improve support for supplementary characters.

###@###.### 2004-04-12
                                     
2004-04-12
CONVERTED DATA

BugTraq+ Release Management Values

COMMIT TO FIX:
dragon


                                     
2004-06-14
EVALUATION

Changing the current encoding to support 4-byte supplementary characters could result in creating JAR files that are incompatible, i.e. cannot be read, by previous Java releases.  This incompatibility is not acceptable.  In fixing 4244499 though, there is a reasonable chance that support can be provided for the current implementation as well as standard UTF-8.
                                     
2008-04-09
EVALUATION

We go with the standard UTF-8 chaset. We can NOT keep the forward-compatibility for this case. If someone really needs to generate the "old-style" jar/zip file with the latest version of JDK/JRE, we might consider to add the "modified" utf-8 into our charset respository.
                                     
2009-04-16



Hardware and Software, Engineered to Work Together