JDK-7036144 : GZIPInputStream readTrailer uses faulty available() test for end-of-stream
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.jar
  • Affected Version: 6u24
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2011-04-13
  • Updated: 2024-04-13
  • Resolved: 2024-03-20
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 23
23 b15Fixed
Related Reports
CSR :  
Duplicate :  
Relates :  
Relates :  
Description
FULL PRODUCT VERSION :
java version "1.6.0_24"

(also believed to affect latest OpenJDK7 previews)

ADDITIONAL OS VERSION INFORMATION :
all OS (pure Java)

A DESCRIPTION OF THE PROBLEM :
GZIPInputStream's readTrailer() method decides whether to keep reading (for the case of concatenated GZIP members) based on whether the underlying stream's available() > 0. This is a faulty test for end-of-stream; socket streams (and perhaps others) may return 0 merely to mean any read would block, not that any read would fail due to the stream having ended.

As a result, un-GZIPping multi-member streams over a network stream (and perhaps in other contexts) can intermittently trigger the same false-end-of-stream that afflicted JDKs through 6u22 after reading exactly one member. (Exactly how many members are read before triggering this depends on the concordance of member-ends with inflated buffers and network delays, so in the wild it expresses somewhat randomly, but very reliably when reading many-membered streams over network connections.)

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Create a GZIP stream concatenated from independent GZIP members. (Shorter members/more-readTrailer-invocations can trigger faster.) Read it over a network connection (the slower the better, but even 100Mbps-plus connections will exhibit eventually). Observe that sometimes the GZIPInputStream ends early, and then considers itself done no matter the availability of more data from the underlying stream. (You can't recover by retrying.)

Or, try the simulated code below for a local demonstration.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Even if the underlying stream sometimes reports available()==0, reading shoudl continue if there's more valid data forthcoming.
ACTUAL -
Depending on member alignment and network/IO issues, GZIPInnputStream may erroneously end early and insistently.

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
public class GZIPAvailableTest {
 
    public static void main(String [] args) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        GZIPOutputStream out = new GZIPOutputStream(baos);
        out.write("boo".getBytes("ASCII"));
        out.close();
        byte[] boo_gz = baos.toByteArray();
        baos.reset();
        for(int i = 0; i<32; i++) {
            baos.write(boo_gz);
        }
        byte[] manyboo_gz = baos.toByteArray();

        GZIPInputStream in = new GZIPInputStream(new ByteArrayInputStream(manyboo_gz));
        long count = 0;
        while(in.read()>-1) {
            count++;
        }
        System.out.println("read bytes with omniscient available():"+count);
        
        // now simulate a stream that might have 0 available even while more
        // data is on the way, as with a socket stream
        GZIPInputStream in2 = new GZIPInputStream(new FilterInputStream(new ByteArrayInputStream(manyboo_gz)) {
            @Override
            public int available() throws IOException {
                return 0;
            }
            
        });
        long count2 = 0;
        while(in2.read()>-1) {
            count2++;
        }
        System.out.println("read bytes with zero available():"+count2);
    }
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Copy the GZIPInputStream (and related Inflater) source code to new classes. Patch readTrailer() to try reading the next header, rather than guessing it may be present by consulting available(). (This approach will fail when the stream has no valid following gzip header -- same as if available were an accurate indicator of more data.) Or, instead of patching readTrailer, change it to protected, and fix in a subclass override. (Please, please, please, make all of this class's private methods protected so we can workaround bugs without wholesale copy/pasting and do things like read both official GZIP header fields and 'extra fields' without reimplementing the whole class.)

Comments
Changeset: d3f3011d Author: Archie Cobbs <acobbs@openjdk.org> Committer: Jaikiran Pai <jpai@openjdk.org> Date: 2024-03-20 15:01:30 +0000 URL: https://git.openjdk.org/jdk/commit/d3f3011d56267360d65841da3550eca79cf1575b
20-03-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/17113 Date: 2023-12-14 20:15:39 +0000
14-12-2023

EVALUATION The submitter is correct.
25-04-2011