Bug ID: JDK-4691425 GZIPInputStream fails to read concatenated .gz files

Type: Bug
Component: core-libs
Sub-Component: java.util.jar
Affected Version: 1.4.0,1.4.1,6

Priority: P4
Status: Closed
Resolution: Fixed
OS: generic,linux,solaris_8
CPU: generic,x86,sparc

Submitted: 2002-05-24
Updated: 2023-12-17
Resolved: 2011-03-08

JDK 6	JDK 7
6u21-revFixed	7 b97Fixed

Name: nt126004			Date: 05/24/2002


FULL PRODUCT VERSION :
java version "1.3.1"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1-b24)
Java HotSpot(TM) Client VM (build 1.3.1-b24, mixed mode)


FULL OPERATING SYSTEM VERSION :Linux skipper 2.4.2-2 #1 Sun
Apr 8 20:41:30 EDT 2001 i686 unknown


ADDITIONAL OPERATING SYSTEMS : Win2000, WinXP



A DESCRIPTION OF THE PROBLEM :
The "read" method from GZIPInputStream is returning -1
before the true actual end of the file. Some .gz files work
just fine and others do not.

JRE 1.2 and 1.3 also do not work.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. java wc.utils.test.test
2. ls -l test.log
3. Results: 24166 test.log

Size of the test.log should be 6,124,669 bytes.

EXPECTED VERSUS ACTUAL BEHAVIOR :
I expected the full decompressed file to be extracted from
the .gz file. Using the command line unix program "gzip"
the total size is 6,124,669 bytes and is decompressed just
fine.

Only 28,672 bytes were extracted before the
GZIPInputStream.read method returns a -1.

ERROR MESSAGES/STACK TRACES THAT OCCUR :
GZIPInputStream.read simply returns -1 too soon.

This bug can be reproduced occasionally.

---------- BEGIN SOURCE ----------
package wc.utils.test;
import java.io.*;
import java.net.*;
import java.util.zip.*;

public class test
{
    final private static    int     BUFFER_SIZE     = 4096;

    public static void main(String args[] )
    {
        byte[]          byteBuffer          = new byte [ BUFFER_SIZE ];
        int             bytesRead           = -1;

        try
        {
            FileOutputStream fileOut = new FileOutputStream("output.txt");
            FileInputStream fileInput       = new FileInputStream
                                              ( "test.log" );
            GZIPInputStream fileGzipInput   = new GZIPInputStream ( fileInput,
                                                             BUFFER_SIZE );

            bytesRead   = fileGzipInput.read ( byteBuffer, 0, BUFFER_SIZE );
            while ( bytesRead >= 0 )
            {   
                fileOut.write(byteBuffer, 0, bytesRead);
                byteBuffer  = new byte [ BUFFER_SIZE ];
                bytesRead   = fileGzipInput.read ( byteBuffer, 0, BUFFER_SIZE );
            }
        }
        catch (Exception e)
        {   
            System.err.println ( e.toString() );
            e.printStackTrace ( System.err );
        }
    
    } // end of public static void main()

} // end of public class test


---------- END SOURCE ----------

CUSTOMER WORKAROUND :
Absolutely none.
(Review ID: 146741) 
======================================================================

Name: nt126004			Date: 03/04/2003


FULL PRODUCT VERSION :
java version "1.4.1_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_01-b01)
Java HotSpot(TM) Client VM (build 1.4.1_01-b01, mixed mode)

FULL OPERATING SYSTEM VERSION :SunOS dev1 5.8 Generic_108528-15 sun4u sparc SUNW,Sun-Fire-480R


ADDITIONAL OPERATING SYSTEMS :

A DESCRIPTION OF THE PROBLEM :
When reading in the contents of a concatenated gzip file (one comprised of more than one gzip file) GZIPInputStream returns eof when it encounters the first gzip trailer.  When using GZIPInputStream on other platforms this problem does not occur.  The man page for gzip specifies that gunzip allows for concatenated files, though I am not sure which behavior was intended for the Solaris JDK.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Create a concatenated gzip file, e.g. cat file1.gz file2.gz > file3.gz
or use NIO channels.
2. Read in the new file using FileInputStream > GZIPInputStream > InputStreamReader > LineNumberReader

EXPECTED VERSUS ACTUAL BEHAVIOR :
- FileInputStream reports more data available, while GZIPInputStream has closed after hitting first file trailer.
- Trying to open a new GZIPInputStream on the FileInputStream results in a "not in GZIP format" error

ERROR MESSAGES/STACK TRACES THAT OCCUR :
java.io.IOException: Not in GZIP format
2003/02/21 16:32:34.945 NULL       ERROR   WileyWebLo +         at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:131)
2003/02/21 16:32:34.945 NULL       ERROR   WileyWebLo +         at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
2003/02/21 16:32:34.945 NULL       ERROR   WileyWebLo +         at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
/**
 * TestGZIPIS
 *
 * Description
 *
 * Date: Feb 24, 2003
 *
 * @author johnclay
 * @version $Id$
 *
 */

import java.io.LineNumberReader;
import java.io.InputStreamReader;
import java.io.FileInputStream;
import java.util.zip.GZIPInputStream;

public class TestGZIPIS {

    public static void main (String[] args) {
        //send in file path as arg[0]
        FileInputStream fis = new FileInputStream(args[0]);
        GZIPInputStream gzis = new GZIPInputStream(fis);
        InputStreamReader isr = new InputStreamReader(gzis);
        LineNumberReader lnr = new LineNumberReader(isr);
        String line;
        while((line = lnr.readLine()) != null) {
            System.out.println(line);
        }
    }
}
---------- END SOURCE ----------

CUSTOMER WORKAROUND :
have tried numerous workarounds including various tests for real end of file, opening new streams, marking, reseting, filtering, etc.
(Review ID: 181706)
======================================================================

EVALUATION We can write files we can't read, which as per RFC 1952, should be able to read. The SDN-provided fix works, though a more optimal fix might be possible.

16-10-2006

EVALUATION It is not obvious that accepting multiple .gz files concatenated together is actually an improvement. I recommend closing this as Not A Defect.

18-02-2006

EVALUATION There is no .gz file attached. We need to have the .gz file before we can reproduce this bug. ###@###.### 2002-05-29 gz file has been received and analyzed. This file was created using a compressor based on zlib-1.0.8. The generated gzip uses multiple GZIP members consisting of <header1><compressed-data1><trailer1><header2><compressed-data2><trailer2>..... <headerN><compressed-dataN><trailerN> The current GZIPInputStream implementation clearly doesn't support such a multiple member GZIP file format. Investigating effort required to have it support it with no API changes or any workarounds. ###@###.### 2002-10-03 Although it isn't documented within the javadocs for the java.util.zip.* classes our implementation expects the standard single header, single trailer format for the compressed GZ stream. Changing this from a bug to RFE. I will also open a documentation bug to track addition of some clarification/updates to the java.util.zip.* docs to indicate this current limitation. ###@###.### 2002-10-15

15-10-2002

Duplicate :	JDK-4763158 - Need to document current limitations of java.util.GZIPInputStream
Relates :	JDK-7021870 - GzipInputStream closes underlying stream during reading
Relates :	JDK-8322256 - Define and document GZIPInputStream concatenated stream semantics