See jdk/src/share/native/java/util/zip/zip_util.c jdk/src/share/native/java/util/zip/zip_util.h This data structure is created once for each entry in each JAR file loaded by the JVM at runtime: typedef struct jzcell { unsigned int hash; /* 32 bit hashcode on name */ jlong cenpos; /* Offset of central directory file header */ unsigned int next; /* hash chain: index into jzfile->entries */ } jzcell; This takes 16 bytes on 32-bit VM. On 64-bit VM, due to inefficient structure alignment, this takes 24 bytes. rt.jar on JDK8 has about 18000 entries, so the size of the entries hash table (stored in jzfile::entries) is about 280KB on 32-bit VM and 420KB on 64-bit VM. This table is loaded in memory as long as the JAR file is in use. In the case of rt.jar, this table is never deallocated. While the 64-bit usage can be easily reduced to the same as 32-bit (by rearranging the fields in the jzcell stucture), we can further reduce the size of jzcell: typedef struct jzcellsmall { unsigned short hash; /* (truncated) 16 bit hashcode on name */ jshort next; /* hash chain: index into jzfile->entries */ jint cenpos; /* Offset of central directory file header */ } jzcellsmall; This can reduce the memory usage to 8 bytes per JAR entry (for both 32-bit and 64-bit VMs). We can use this form as long as the JAR file is less than 2^30 bytes in size and has fewer than 32768 entries. This applies to rt.jar in all versions of JDK (about 18000 entries, 65MB size in JDK8). Note that truncating the stored hash value from 32-bit to 16-bit introduces no extra collision in the case of rt.jar in JDK8. I.e., for all entries pairs A and B, where A and B belong to the same bucket, the lower 16 bits of the hash values of A and B are not equal. Therefore, using jzcellsmall will introduce no extra I/O access. Savings on 64-bit VM (patch compared to jdk1.8.0_ea_b68): HelloWorld: Before: 475564 bytes After: 164618 bytes Reduction: 310946 bytes Eclipse IDE Before: 1693284 bytes After: 586946 bytes Reduction: 1106338 bytes
|