FULL PRODUCT VERSION :
java version "1.5.0_11"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_11-b03)
Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_11-b03, mixed mode)
and
java version "1.6.0"
Java(TM) SE Runtime Environment (build 1.6.0-b105)
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0-b105, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
This behavior has been observed under:
Suse Linux 10.1 x86_64
Windows XP Business 32 bit
Ubuntu 7.04 x86_64
EXTRA RELEVANT SYSTEM CONFIGURATION :
No relevant hardware pattern
A DESCRIPTION OF THE PROBLEM :
When attempting to read a large file, GZipInputStream behaves improperly. This behavior is observed when attempting to ingest very large files - with compressed sizes ranging from 10-40GB and uncompressed sizes between 200-400GB. The 2nd symptom - partial read - occurs across all tested JVM's back to 1.4. The untrappable exception only occurs in versions 1.5 and above.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1) Open an InputStream to a large gzipped file.
2) Create a new GZIPInputStream on the above stream.
3) Attempt to read the full contents of the stream.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Ideally, GZIPInputStream would allow the entire file to be read without exception.
ACTUAL -
Two symptoms of this condition consistently occur:
1) An untrappable NumberFormatException is reported directly to STDERR.
2) Stream traversal completes before the real EOS is reached.
The exception is untrappable and does not interrupt execution. The specific string referenced in the exception varies from file to file but is always consistent for a specific file.
The GZipInputStream can still be read as normal and appears to read to the end of the stream without further exception. However - this traversal completes and believes it has reached the end of the stream when in fact only a small portion of the stream has been read. The actual number of bytes read before the stream believes it is complete varies from file to file, but is always consistent on a per-file basis.
ERROR MESSAGES/STACK TRACES THAT OCCUR :
java.lang.NumberFormatException: For input string: "15983638838"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:459)
at java.lang.Integer.parseInt(Integer.java:497)
at sun.net.www.protocol.ftp.FtpURLConnection.getInputStream(FtpURLConnection.java:398)
at com.weather.logs.Parser.main(Parser.java:25)
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
URL url = new URL("http://some.ftp.site/some/big/file.gz");
URLConnection urlc = url.openConnection();
BufferedInputStream bis = new BufferedInputStream(urlc.getInputStream());
GZIPInputStream gzipis = new GZIPInputStream(bis);
int len = 0, total = 0;
byte[] inBuff = new byte[256];
while ((len = gzipis.read(inBuff)) != -1) {
total += len;
}
---------- END SOURCE ----------