Bug ID: JDK-8322256 Define and document GZIPInputStream concatenated stream semantics

Type: Bug
Component: core-libs
Sub-Component: java.util.jar
Affected Version: 7

Priority: P4
Status: In Progress
Resolution: Unresolved
OS: generic
CPU: generic

Submitted: 2023-12-17
Updated: 2024-11-25

Other
tbdUnresolved

GZIPInputStream supports reading data from multiple concatenated GZIP data streams since JDK-4691425. In order to do this, after the trailer of a stream is read, it attempts to read the header of the next stream, and if successful, proceeds onward, and if the attempt fails, it just ignores the trailing garbage and returns end-of-data.

There are several issues with this:

1. The behaviors of (a) supporting concatenated streams and (b) ignoring trailing garbage are not documented, much less precisely specified.

2. Ignoring trailing garbage is dubious because it could easily hide errors or other data corruption that an application would rather be notified about. Moreover, the API claims that a ZipException will be thrown when corrupt data is read, but obviously that doesn't happen in the trailing garbage scenario.

3. There's no way to create a GZIPInputStream that does NOT support stream concatenation. For example, an application that wanted to send multiple sequential compressed streams over a single underlying stream and read them out one at a time might want to operate in this mode.

See this github comment for a history of this class: https://github.com/openjdk/jdk/pull/17113#issuecomment-1859177655

Suggestion:

- Add new method setEnableConcatenatedStreams(boolean), default true
- When concatenated streams disabled, stop after reading a stream trailer
- When concatenated streams enabled, throw ZipException if there is any data after a trailer but it cannot be successfully interpreted as a next header

From a backward-compatibility point of view, those changes would give the current behavior except now bogus trailing garbage would generate a ZipException instead of being discarded. For more perfect backward compatibility, there could be another knob setIgnoreTrailingGarbage(boolean).

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/20787 Date: 2024-08-30 07:27:11 +0000

30-08-2024

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/18385 Date: 2024-03-19 21:48:14 +0000

09-07-2024

CSR :	JDK-8330195 - Define and document GZIPInputStream concatenated stream semantics
Relates :	JDK-7036144 - GZIPInputStream readTrailer uses faulty available() test for end-of-stream
Relates :	JDK-4691425 - GZIPInputStream fails to read concatenated .gz files