JDK-7009069 : ZipFile.getEntry(String name) returns null because it does NOT respect the "language encoding flag"
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.jar
  • Affected Version: 7,8,11
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • OS: generic
  • CPU: generic
  • Submitted: 2010-12-27
  • Updated: 2024-07-30
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Description
JDK version: 7-pro-b123

The following code shows ZipEntry.getEntry(String name) does NOT respect the "language encoding flag".

=====Begin of Code=====
import java.io.*;
import java.nio.charset.Charset;
import java.nio.file.FileSystems;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.HashMap;
import java.util.Map;
import java.util.zip.*;

import com.sun.nio.zipfs.ZipFileAttributeView;
import com.sun.nio.zipfs.ZipFileAttributes;
import com.sun.nio.zipfs.ZipFileSystem;

import  static java.lang.System.out;;

public class EFSZipFS {

	private static final String name = "zipEntryName\u7b80";
	private static final String comment = "zipEntryComment\u7b80";

	public static void main(String[] args) throws Exception {
		// TODO Auto-generated method stub
		byte[] bb = "This is the conent of the zipfile".getBytes("ISO-8859-1");
		Charset cs = Charset.forName("utf8");
		ByteArrayOutputStream baos = new ByteArrayOutputStream();
		ZipOutputStream zos = new ZipOutputStream(baos, cs);

		ZipEntry e = new ZipEntry(name);
		e.setComment(comment);
		zos.putNextEntry(e);
		zos.write(bb, 0, bb.length);
		zos.closeEntry();
		zos.close();

		Paths.get("test.zip").deleteIfExists();
		File f = new File("test.zip");
		FileOutputStream fos = new FileOutputStream(f);
		baos.writeTo(fos);
		fos.close();
		
		// Read back with ZipFile-utf8
		out.println("Read back with ZipFile-utf8");
		ZipFile zf = new ZipFile("test.zip", Charset.forName("utf8"));
		System.out.println(zf.getEntry(name).getComment());
		zf.close();
		
		// Read back with ZipFile-gb2312
		out.println("Read back with ZipFile-gb2312");
		zf = new ZipFile("test.zip", Charset.forName("gb2312"));
		System.out.println(zf.getEntry(name).getComment());
		zf.close();
	}
}
=====End of Code=====

=====Begin of Output=====
Read back with ZipFile-utf8
zipEntryComment¿
Read back with ZipFile-gb2312
Exception in thread "main" java.lang.NullPointerException
	at EFSZipFS.main(EFSZipFS.java:50)
=====End of Output=====

The source code of ZipEntry shows the root cause:
   public ZipEntry getEntry(String name) {
        if (name == null) {
            throw new NullPointerException("name");
        }
        long jzentry = 0;
        synchronized (this) {
            ensureOpen();
            jzentry = getEntry(jzfile, zc.getBytes(name), true);  // !!!Here we should also check "language encoding flag"!!!
            if (jzentry != 0) {
                ZipEntry ze = getZipEntry(name, jzentry);
                freeEntry(jzfile, jzentry);
                return ze;
            }
        }
        return null;
    }

    private static native long getEntry(long jzfile, byte[] name,
                                        boolean addSlash);

Comments
Re-opening since the NullPointerException reported in the bug description (using that reproducer) continues to happen on Java 8 and Java 11. It doesn't reproduce on Java 17 or later. The essence of this issue is that ZipFile.getEntry(name) returns null (even when the entry is present) on Java versions 8 and 11 due to that method not honouring the language encoding flag of the entry.
02-01-2024

Closing this as fixed, probably by JDK-8243469. I filed JDK-8322802 to track adding test for getEntry respecting the "language encoding" flag, as this is not currently tested.
01-01-2024

JDK-8243469 might have "accidentally" fixed this issue.
31-12-2023

The sample code provided in the issue description no longer fails, but instead succeeds as follows: Read back with ZipFile-utf8 zipEntryComment\u7b80 Read back with ZipFile-gb2312 zipEntryComment\u7b80 It seems this issue might have been "accidentally" fixed by JDK-8243469.
31-12-2023

EVALUATION The "language encoding flag" is an attribute that stored in the loc/cen entry that ZipFile.getEntry(String name) tries to locate. So you can't use this info when you don't even have it. A possible alternative here is that to try UTF-8 when the specified non-utf-8 charset fails to locate any matching entry, a performance price to pay, though. Not a p3 bug for now.
12-01-2011