JDK-8170831 : ZipFile implementation no longer caches the last accessed entry/pos
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.jar
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2016-12-06
  • Updated: 2020-02-18
  • Resolved: 2016-12-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9 b149Fixed
Related Reports
Relates :  
Description
The ZipFile's ZIP format support implementation has been pulled up from the native C to Java level in JDK9, with the benefits of no more expensive back and forth jni calls, no more expensive native memory allocation for each every zip entry lookup/access... With the assumption that now a entry lookup cost is a simple hash table lookup the latest new implementation actually removed a "tricky" cache mechanism existing in the old C implementation, in which it always caches the last accessed native entry (with name, loc position info), with the assumption that the "use pattern" of zip entry is something like

ZipEntry e = zipfile.getEntry(String name);
InputStream is = zipfile.getInputStream(e);
...

With the cache in place, the implementation can avoid the second time expensive lookup when the invoker comes back to read the bytes with the entry we just handed out.

After some analysis of certain use scenario recently, it appears it might still be desired to have such cache  mechanism to reduce the unnecessary lookup cost in use scenario mentioned above, for example the lookup cost of encoding the name from String to byte[] (for name table lookup). Also this cache mechanism can also help the corner case of "entries with same names" in a zip/jar file.

While "entries with same names" in a zip/jar file is not encouraged (out ZipInputStream throws exception if such attempt is tried), the ZIP format spec does not really say anything about it. And the old ZipFile implementation actually works correctly to give you the corresponding bytes during iteration with the use pattern as 
zf.stream().forEach( ze -> zf.getInputStream(ze).readAllBytes() ...)

So the proposal here is to add this cache mechanism back.