Bug ID: JDK-8327489 Make GZIPInputStream no longer rely on available() for end-of-stream test

JDK-8327489 : Make GZIPInputStream no longer rely on available() for end-of-stream test

Type: CSR
Component: core-libs
Sub-Component: java.util.jar

Priority: P4
Status: Closed
Resolution: Approved
Fix Versions: 23

Submitted: 2024-03-06
Updated: 2024-09-27
Resolved: 2024-03-08

Related Reports

CSR :	JDK-7036144 - GZIPInputStream readTrailer uses faulty available() test for end-of-stream
Relates :	JDK-8340729 - GZIPInputStream readTrailer uses faulty available() test for end-of-stream

Description

Summary
-------

Update `java.util.zip.GZIPInputStream` so it doesn't rely on `java.io.InputStream.available()` method to decide whether or not to read a concatenated GZIP stream from the underlying input stream.

Problem
-------

The `GZIPInputStream` class takes an `InputStream` to read compressed GZIP data from. GZIP format allows for multiple GZIP streams to be concatenated. An undocumented feature of the implementation in `GZIPInputStream` is that it supports reading such concatenated GZIP streams. This is possible because the GZIP format defines a 8 byte trailer representing the end of an individual GZIP stream.

`GZIPInputStream` has a public `read(byte[] buf, int off, int len)` method which returns the uncompressed data after reading from the underlying, possibly concatenated GZIP streams. The current implementation of this method after having read an 8 byte trailer in the underlying stream, calls the `java.io.InputStream.available()` method on the underlying stream to decide whether or not there's a subsequent concatenated GZIP stream data. If the `available()` method call returns `0` then the implementation in `GZIPInputStream.read()` does not read any additional data and marks the `GZIPInputStream` as having reached the end of compressed input stream. Any subsequent calls to `read()` will return `-1` indicating the end of stream.

Relying on the return value of `InputStream.available()` method is not appropriate since the `InputStream.available()` as per its API javadoc states that the return value is merely an estimate of the number of bytes available. That method's API javadoc further states:
```
Note that while some implementations of {@code InputStream} will return the total number of bytes in the stream, many will not.
```
As a result, the current implementation of `GZIPInputStream.read()` which relies on the underlying `InputStream`'s `available()` method can incorrectly consider the GZIP stream to have reached end of stream even when there may be a concatenated GZIP stream. This results in the `GZIPInputStream.read()` ignoring and thus not returning possibly additional uncompressed data of underlying GZIP streams.

Solution
--------

The `GZIPInputStream.read()` will be updated to remove the check on `InputStream.available()`. The implementation, after reading a 8 byte GZIP stream trailer, will now attempt to read a GZIP stream header from the underlying input stream. If the additional `read()`s on the underlying input stream return enough bytes and those bytes represent a GZIP stream header, then the `GZIPInputStream.read()` method will consider that there is a concatenated GZIP stream and it will continue to return the uncompressed data even from the concatenated stream. If however, the `read()`s on the underlying input stream don't return enough bytes or the returned bytes don't represent a GZIP stream header, then the `GZIPInputStream` will be marked as having reached the end of compressed input stream.

Specification
-------------
There are no specification changes.

Comments

Moving to Approved contingent on JDK-8322256 getting addressed in JDK 23.
08-03-2024
Hi Jaikrian - That is fine with me as the end result is the same. I'll back out my Javadoc change and save it for later with JDK-8322256.
08-03-2024
Hello Archie, I would prefer to add the javadoc specification as a separate issue. In its current form, the GZIPInputStream is minimally specified without any mention of concanetated streams. The change you are proposing right now is merely one bug fix to the unspecified implementation detail. I believe https://bugs.openjdk.org/browse/JDK-8322256 will be a better issue to introduce the javadoc to specify all the currently unspecified behaviour of GZIPInputStream and that I think will take several rounds of reviews and discusion. Does that sound reasonable?
08-03-2024
Hi Jaikiran, > I was hoping to keep the changes in this issue to just the internal implementation changes, so I think leaving out the javadoc additions or changes would be good. OK so are you saying that you don't like the idea of adding the Javadoc, or that you would prefer it be added as a separate issue? In either case I'm curious to better understand your reasoning. Thanks.
07-03-2024
Hello Archie, it appears that we might have edited at the same time. The current text in the CSR contains my edited content and marks it as not having any specification changes. I was hoping to keep the changes in this issue to just the internal implementation changes, so I think leaving out the javadoc additions or changes would be good.
07-03-2024
Hi @Jaikiran, Thanks! Looks like we were editing at the same time - did you mean to erase the Javadoc that I just added? If it doesn't need to be in here that's fine but I thought it did since it becomes part of the behavior specification. Thanks.
07-03-2024
Hello Archie, I've updated the text in the CSR a bit. I have marked this as reviewed and I will also ask a few other people more familiar with this class to take a look too. Please wait for another Reviewer to this CSR before moving this to Finalized.
07-03-2024
On second thought, optimistic future plans are not a reason to NOT add Javadoc here, so I've added some.
07-03-2024
> If this change does go forward, perhaps some javadoc update would be appropriate? Yes - in fact the current intention is to go even further, i.e., add some new ability to make it configurable whether to read past the end of the first stream and clearly document the semantics of both options. This would be done as part of a separate issue/CSR pair.
06-03-2024
Hmm. As currently written, there is no specification change so I'm changing the scope to Implementation rather than SE. I've marked the parent issue as requiring a release note. I'm moving this CSR to Provisional to indicate I think it merits further investigation; it will certainly need to be reviewed closely by people working in jar/zip and io before being Finalized. If this change does go forward, perhaps some javadoc update would be appropriate?
06-03-2024