JDK-8254284 : Refine ZipOutputStream.putNextEntry() to recalculate ZipEntry's compressed size
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.util.jar
  • Priority: P4
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 16
  • Submitted: 2020-10-09
  • Updated: 2020-10-14
  • Resolved: 2020-10-14
Related Reports
CSR :  
Description
Summary
-------

Refine `ZipOutputStream.putNextEntry(ZipEntry)` to recalculate ZipEntry's compressed size when `ZipEntry.setCompressedSize()` has not been explicitly called on the `ZipEntry`.

Problem
-------

In general it is not safe to directly write a ZipEntry obtained from `ZipInputStream.getNextEntry()`, `ZipFile.entries()`, `ZipFile.getEntry()` or `ZipFile.stream()` with `ZipOutputStream.putNextEntry()` to a `ZipOutputStream` and then read the entries data from the `ZipInputStream` and write it to the `ZipOutputStream` as follows:
```
 ZipEntry entry;
 ZipInputStream zis = new ZipInputStream(...);
 ZipOutputStream zos = new ZipOutputStream(...);
 while((entry = zis.getNextEntry()) != null) {
     zos.putNextEntry(entry);
     zis.transferTo(zos);
 }
```
The problem with this code is that the ZIP file format does not record the compression level used for deflation in its entries. In general, it doesn't even mandate a predefined compression ratio per compression level. Therefore the compressed size recorded in a `ZipEntry` read from a ZIP file might differ from the new compressed size produced by the receiving `ZipOutputStream`. Such a difference will result in a `ZipException` with the following message:
```
 java.util.zip.ZipException: invalid entry compressed size (expected 12 but got 7 bytes)
``` 
Unfortunately, there are examples where code which follows this coding pattern: Searching for `"java.util.zip.ZipException: invalid entry compressed size (expected 12 but got 7 bytes)"` gives ~2500 hits (~100 on StackOverflow). It's also no hard to find plenty of instances of this anti-pattern on GitHub when doing a code search for `ZipEntry` and `putNextEntry()`. E.g. [Gradle 4.x wrapper task][1] is affected as well as the latest version of the [mockableAndroidJar task][2]. I've recently fixed two occurrences of this pattern in OpenJDK (see [JDK-8240333][3] and [JDK-8240235][4]) but there still exist more of them (e.g. [test/jdk/java/util/zip/ZipFile/CopyJar.java][5] which is there since 1999 :). 

Solution
--------

In `ZipOutputStream.putNextEntry(ZipEntry)` ignore and recalculate ZipEntry's compressed size when `ZipEntry.setCompressedSize()` has not been explicitly called on that `ZipEntry`.

Specification
-------------

Changes to the JarOutputStream API doc:
```
--- a/src/java.base/share/classes/java/util/jar/JarOutputStream.java
+++ b/src/java.base/share/classes/java/util/jar/JarOutputStream.java
@@ -76,8 +76,15 @@ public class JarOutputStream extends ZipOutputStream {
     /**
      * Begins writing a new JAR file entry and positions the stream
      * to the start of the entry data. This method will also close
-     * any previous entry. The default compression method will be
-     * used if no compression method was specified for the entry.
+     * any previous entry.
+     * <p>
+     * The default compression method will be used if no compression
+     * method was specified for the entry. When writing a compressed
+     * (DEFLATED) entry, and the compressed size has not been explicitly
+     * set with the {@link ZipEntry#setCompressedSize(long)} method,
+     * then the compressed size will be set to the actual compressed
+     * size after deflation.
+     * <p>
      * The current time will be used if the entry has no set modification
      * time.
      *
```

Changes to the the `ZipOutputStream` API doc:
```
--- a/src/java.base/share/classes/java/util/zip/ZipOutputStream.java
+++ b/src/java.base/share/classes/java/util/zip/ZipOutputStream.java
@@ -179,9 +179,15 @@ public class ZipOutputStream extends DeflaterOutputStream implements ZipConstant
     /**
      * Begins writing a new ZIP file entry and positions the stream to the
      * start of the entry data. Closes the current entry if still active.
+     * <p>
      * The default compression method will be used if no compression method
-     * was specified for the entry, and the current time will be used if
-     * the entry has no set modification time.
+     * was specified for the entry. When writing a compressed (DEFLATED)
+     * entry, and the compressed size has not been explicitly set with the
+     * {@link ZipEntry#setCompressedSize(long)} method, then the compressed
+     * size will be set to the actual compressed size after deflation.
+     * <p>
+     * The current time will be used if the entry has no set modification time.
+     *
      * @param     e the ZIP entry to be written
      * @throws    ZipException if a ZIP format error has occurred
      * @throws    IOException if an I/O error has occurred
```


See https://github.com/openjdk/jdk/pull/520

The complete changes are attached as `0001-8253952-Refine-ZipOutputStream.putNextEntry-to-recal.patch` to this CSR.


[1]: https://github.com/gradle/gradle/blob/e76905e3a/subprojects/build-init/src/main/java/org/gradle/api/tasks/wrapper/Wrapper.java#L152-L158
[2]: https://android.googlesource.com/platform/tools/base/+/refs/heads/master/build-system/builder/src/main/java/com/android/builder/testing/MockableJarGenerator.java#86
[3]: https://bugs.openjdk.java.net/browse/JDK-8240333
[4]: https://bugs.openjdk.java.net/browse/JDK-8240235
[5]: https://github.com/openjdk/jdk/blob/master/test/jdk/java/util/zip/ZipFile/CopyJar.java
[6]: https://github.com/simonis/zlib-bench/blob/master/Results.md
[7]: https://en.wikipedia.org/wiki/Zip_(file_format)
Comments
Moving to Approved.
14-10-2020

The compatibility risk of this change is Low and helps incorrect usages in tools that leads to "corrupted" ZIP files. The ZIP area has a long history of issues surfacing months and years after a change. Real world testing of the JDK 16 EA builds will help improve confidence that there aren't any issues. A release note will be created for this change.
14-10-2020

I would re-work the compatibility Risk Description so that it is clear what the change in behavior means to existing code. The change basically results in a re-calculation of the compression size and should not cause problems which is why the compatibility impact is minimal.
13-10-2020