JDK-6192937 : Problem in using append mode in ZipOutputStream
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.jar
  • Affected Version: 1.4.2_01
  • Priority: P2
  • Status: Closed
  • Resolution: Not an Issue
  • OS: solaris_2.6
  • CPU: unknown
  • Submitted: 2004-11-09
  • Updated: 2010-04-05
  • Resolved: 2004-11-24
Description
Bugster (The Bugtraq 2.0 ) uses the java.utl.zip package to compress and store
attachments. 
The following code is used by Bugster to create the 
zip file :


BEGIN CODE
----------
		BufferedInputStream istream = new BufferedInputStream(input);
        OutputStream ostream = null;

	if ( !Compress.isCompressed(path) )
	{
            ostream = new BufferedOutputStream(new FileOutputStream(path + ".zip", append), 4096);
            zipWriter = new Zip(ostream, attachFile.getName());
            ostream = zipWriter.getOutputStream();
	}
	else
            ostream = new BufferedOutputStream(new FileOutputStream(path, append), 4096);

END  CODE 
------

Files larger than 10 MB are broken into 10 MB chunks and  the first
10 MB is used to create the compressed zip file and the remaining 10 MB
chunks are appended using ZipOutputStream. 

The problem we have is that the final zip file (after all the 10 MB 
chunks are appended) cannot be opened by Solaris unzip utility.

We have now modified the BT2.0 code to not use the append feature of
ZipOutputStream. We have a significant # of Zip files in Bugtraq2.0 
that have been created using the ZipOutputStream in append mode.

Is there any way of recovering data from these zip files ?



###@###.### 2004-11-09 19:53:38 GMT

Comments
EVALUATION Using append mode on any kind of structured data like zip files or tar files is not something you can really expect to work. These file formats have an intrinsic "end of file" indication built into the data format. Still, perhaps the result is what you expect. One suspects that multiple logical zip files are written to the output, as if like cat foo.zip bar.zip > foobar.zip One might be able to recover the data by looking for the special PK markers in the source file to find the end bytes using a binary editor such as emacs. I don't know what the "Zip" class referenced in the sample code might be. We need an example using only JDK classes. ###@###.### 2004-11-16 04:07:44 GMT The user's primary need is data recovery. Here is the world's smallest zip file reader, designed to do just that. perl -e 'undef $/; for (<> =~ /(PK\003\004.*?PK\005\006.{18})(?=PK\003\004|\z)/sg) {open Z, "> @{[$j++]}.zip"; print Z $_}' INPUTFILE Run the above code (on one line, with INPUTFILE replaced by name of file containing concatenated zip files. It will split the input file into a number of separate valid zip files. Not 100% reliable, but should work on non-pathological input, (like storing zip files within zip files). Let me know whether this allows recovery of the data. As to whether the JDK's behavior is a bug, that is debatable. If you open a ZipOutputStream in append mode, it seems perfectly reasonable behavior to write output as usual, appending to the original contents of the file, which also happen to be the contents of a zip file. The fact that the resulting concatenated file is not a valid zip file is then only natural, in the same way that using cat to create a concatenated zip file "works", but creates a file not in a format expected by jar or unzip. ###@###.### 2004-11-17 17:30:17 GMT If you open any output stream in append mode, it is perfectly reasonable to simply append to whatever data was there before, whether or not the combined data can be easily read using a standard utility. The append flag can be regarded like cat -- it is not a bug if `cat' combines two zip files into a new file that `unzip' cannot read. ###@###.### 2004-11-24 00:35:19 GMT
16-11-2004