JDK-5030283 : Incorrect implementation of UTF-8 in zip package
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.jar
  • Affected Version: 5.0
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2004-04-12
  • Updated: 2009-04-25
  • Resolved: 2009-04-25
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7
7 b57Fixed
Related Reports
Relates :  
Relates :  
Description
Name: nl37777			Date: 04/12/2004

Several parts of the zip package handle the UTF-8 encoding 
of entry names incorrectly. They assume either an ancient form of UTF-8 
which didn't have the 4-byte form for supplementary characters, or rely 
on the JVM's modified UTF-8, which has the same limitation. As a 
consequence, file names using supplementary characters can be used, but 
cannot be exchanged with standards-compliant zip implementations.

The following parts of the implementation are incorrect:
src/share/classes/java/util/zip/ZipInputStream.java
     getUTF8String
src/share/classes/java/util/zip/ZipOutputStream.java
     getUTF8Length
     getUTF8Bytes
src/share/native/java/util/zip/ZipEntry.c
     Java_java_util_zip_ZipEntry_initFields
src/share/native/java/util/zip/ZipFile.c
     Java_java_util_zip_ZipFile_getEntry
     Java_java_util_jar_JarFile_getMetaInfEntryNames
======================================================================

Comments
EVALUATION We go with the standard UTF-8 chaset. We can NOT keep the forward-compatibility for this case. If someone really needs to generate the "old-style" jar/zip file with the latest version of JDK/JRE, we might consider to add the "modified" utf-8 into our charset respository.
16-04-2009

EVALUATION Changing the current encoding to support 4-byte supplementary characters could result in creating JAR files that are incompatible, i.e. cannot be read, by previous Java releases. This incompatibility is not acceptable. In fixing 4244499 though, there is a reasonable chance that support can be provided for the current implementation as well as standard UTF-8.
09-04-2008

CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: dragon
14-06-2004

EVALUATION Yes, we need to fix these to improve support for supplementary characters. ###@###.### 2004-04-12
12-04-2004