Bug ID: JDK-8005466 JAR file entry hash table uses too much memory (zip

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 7	JDK 8
7u40Fixed	8 b75Fixed

See
jdk/src/share/native/java/util/zip/zip_util.c
jdk/src/share/native/java/util/zip/zip_util.h

This data structure is created once for each entry in each JAR file loaded by the JVM at runtime:
typedef struct jzcell {
unsigned int hash; /* 32 bit hashcode on name */
jlong cenpos; /* Offset of central directory file header */
unsigned int next; /* hash chain: index into jzfile->entries */
} jzcell;

This takes 16 bytes on 32-bit VM. On 64-bit VM, due to inefficient structure alignment, this takes 24 bytes.

rt.jar on JDK8 has about 18000 entries, so the size of the entries hash table (stored in jzfile::entries) is about 280KB on 32-bit VM and 420KB on 64-bit VM. This table is loaded in memory as long as the JAR file is in use. In the case of rt.jar, this table is never deallocated.

While the 64-bit usage can be easily reduced to the same as 32-bit (by rearranging the fields in the jzcell stucture), we can further reduce the size of jzcell:

typedef struct jzcellsmall {
unsigned short hash; /* (truncated) 16 bit hashcode on name */
jshort next; /* hash chain: index into jzfile->entries */
jint cenpos; /* Offset of central directory file header */
} jzcellsmall;

This can reduce the memory usage to 8 bytes per JAR entry (for both 32-bit and 64-bit VMs). We can use this form as long as the JAR file is less than 2^30 bytes in size and has fewer than 32768 entries. This applies to rt.jar in all versions of JDK (about 18000 entries, 65MB size in JDK8).

Note that truncating the stored hash value from 32-bit to 16-bit introduces no extra collision in the case of rt.jar in JDK8. I.e., for all entries pairs A and B, where A and B belong to the same bucket, the lower 16 bits of the hash values of A and B are not equal. Therefore, using jzcellsmall will introduce no extra I/O access.

Savings on 64-bit VM (patch compared to jdk1.8.0_ea_b68):

HelloWorld:
Before: 475564 bytes
After: 164618 bytes
Reduction: 310946 bytes

Eclipse IDE
Before: 1693284 bytes
After: 586946 bytes
Reduction: 1106338 bytes

I will try to submit a simple re-alignment fix first. The savings are for 64-bit VM only and are only half as noted in the original bug description, but still significant: HelloWorld: about 150KB. Eclipse: about 540KB.
08-01-2013
re-alignment is fine. to change "cenpos" from jlong to jint and "next" from unsigned int to jshort are risky (doable, but you probably will have to update "lots" of places here and there). We need to support ZIP64 (on 32-bit as well), which has zip file size > 2^32 and entries > 2^16
02-01-2013
The saving is not for rt.jar alone. It applies to all JAR files that are opened by the JVM. In the case of the "Eclipse IDE" data quoted above, a few dozen JAR files are opened (because their corresponding ClassLoaders are still active). The hash table usage of the Eclipse application JAR files far exceeds that of rt.jar. So unless applications themselves magically give up the use of JAR files, the proposed fix should apply to JDK9 and beyond.
26-12-2012
rt.jar will go away when we move to modules in jdk9, although it's possible that the space savings here may still be useful if (for the short term) that the classes for modules are stored in zip files. See also JEP-161 (http://openjdk.java.net/jeps/161) for the proposed subsets of Java SE planned for jdk8, this will significantly reduce the size of rt.jar for small installations. Even so we should evaluate whether it's worth looking at this for jdk8, although we have to take account of risk, and also that jdk8 is close to feature complete.
26-12-2012