JDK-8337399 : GZIPInputStream readTrailer uses faulty available() test for end-of-stream
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.util.jar
  • Priority: P4
  • Status: Closed
  • Resolution: Withdrawn
  • Fix Versions: 21
  • Submitted: 2024-07-29
  • Updated: 2025-06-16
  • Resolved: 2025-06-16
Related Reports
CSR :  
Duplicate :  
Description
Summary
-------

Update `java.util.zip.GZIPInputStream` so it doesn't rely on `java.io.InputStream.available()` method to decide whether or not to read a concatenated GZIP stream from the underlying input stream.

Problem
-------

The `GZIPInputStream` class takes an `InputStream` to read compressed GZIP data from. GZIP format allows for multiple GZIP streams to be concatenated. An undocumented feature of the implementation in `GZIPInputStream` is that it supports reading such concatenated GZIP streams. This is possible because the GZIP format defines a 8 byte trailer representing the end of an individual GZIP stream.

`GZIPInputStream` has a public `read(byte[] buf, int off, int len)` method which returns the uncompressed data after reading from the underlying, possibly concatenated GZIP streams. The current implementation of this method after having read an 8 byte trailer in the underlying stream, calls the `java.io.InputStream.available()` method on the underlying stream to decide whether or not there's a subsequent concatenated GZIP stream data. If the `available()` method call returns `0` then the implementation in `GZIPInputStream.read()` does not read any additional data and marks the `GZIPInputStream` as having reached the end of compressed input stream. Any subsequent calls to `read()` will return `-1` indicating the end of stream.

Relying on the return value of `InputStream.available()` method is not appropriate since the `InputStream.available()` as per its API javadoc states that the return value is merely an estimate of the number of bytes available. That method's API javadoc further states:
```
Note that while some implementations of {@code InputStream} will return the total number of bytes in the stream, many will not.
```
As a result, the current implementation of `GZIPInputStream.read()` which relies on the underlying `InputStream`'s `available()` method can incorrectly consider the GZIP stream to have reached end of stream even when there may be a concatenated GZIP stream. This results in the `GZIPInputStream.read()` ignoring and thus not returning possibly additional uncompressed data of underlying GZIP streams.

Solution
--------

The `GZIPInputStream.read()` will be updated to remove the check on `InputStream.available()`. The implementation, after reading a 8 byte GZIP stream trailer, will now attempt to read a GZIP stream header from the underlying input stream. If the additional `read()`s on the underlying input stream return enough bytes and those bytes represent a GZIP stream header, then the `GZIPInputStream.read()` method will consider that there is a concatenated GZIP stream and it will continue to return the uncompressed data even from the concatenated stream. If however, the `read()`s on the underlying input stream don't return enough bytes or the returned bytes don't represent a GZIP stream header, then the `GZIPInputStream` will be marked as having reached the end of compressed input stream.

Specification
-------------
There are no specification changes.