JDK-8322256 : Define and document GZIPInputStream concatenated stream semantics
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.jar
  • Affected Version: 7
  • Priority: P5
  • Status: Open
  • Resolution: Unresolved
  • OS: generic
  • CPU: generic
  • Submitted: 2023-12-17
  • Updated: 2024-04-02
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 23
23Unresolved
Related Reports
Relates :  
Relates :  
Description
GZIPInputStream supports reading data from multiple concatenated GZIP data streams since JDK-4691425. In order to do this, after the trailer of a stream is read, it attempts to read the header of the next stream, and if successful, proceeds onward, and if the attempt fails, it just ignores the trailing garbage and returns end-of-data.

There are several issues with this:

1. The behaviors of (a) supporting concatenated streams and (b) ignoring trailing garbage are not documented, much less precisely specified.

2. Ignoring trailing garbage is dubious because it could easily hide errors or other data corruption that an application would rather be notified about. Moreover, the API claims that a ZipException will be thrown when corrupt data is read, but obviously that doesn't happen in the trailing garbage scenario.

3. There's no way to create a GZIPInputStream that does NOT support stream concatenation. For example, an application that wanted to send multiple sequential compressed streams over a single underlying stream and read them out one at a time might want to operate in this mode.

See this github comment for a history of this class: https://github.com/openjdk/jdk/pull/17113#issuecomment-1859177655

Suggestion:

- Add new method setEnableConcatenatedStreams(boolean), default true
- When concatenated streams disabled, stop after reading a stream trailer
- When concatenated streams enabled, throw ZipException if there is any data after a trailer but it cannot be successfully interpreted as a next header

From a backward-compatibility point of view, those changes would give the current behavior except now bogus trailing garbage would generate a ZipException instead of being discarded. For more perfect backward compatibility, there could be another knob setIgnoreTrailingGarbage(boolean).
Comments
A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/18385 Date: 2024-03-19 21:48:14 +0000
02-04-2024