JDK-8286998 : system encoding used by java command line argument files instead of UTF-8
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Affected Version: 18
  • Priority: P3
  • Status: Resolved
  • Resolution: Incomplete
  • OS: generic
  • CPU: generic
  • Submitted: 2022-05-11
  • Updated: 2022-05-20
  • Resolved: 2022-05-19
Related Reports
Relates :  
Description
A DESCRIPTION OF THE PROBLEM :
https://openjdk.java.net/jeps/400 was seen as promise that only UTF-8 would be needed anymore,
but we just see the definition that java command argument files still use system encoding.
see https://docs.oracle.com/en/java/javase/18/docs/specs/man/java.html#java-command-line-argument-files
"characters in system default encoding" 

In eclipse IDE we have the issue https://github.com/eclipse-jdt/eclipse.jdt.debug/issues/45 that the IDE supports both Java 11 and 18 which does not provide any consistent method to get the system encoding (was Charset#defaultCharset)




Comments
First of all, JEP 400 which introduced `native.encoding` does not retroactively address issues on JDKs prior to 18. Yes, there are encoding issues in those prior releases, and that's exactly what JEP 400 intended to address. As to 1) which refers to JDK18, file `path`s are still depending on the system encoding, which is (unfortunately) outside of the JEP 400 scope. However, most platforms default to UTF-8 so I don't expect many issues with it (at least it is the same level as JDK17). Even on Windows, a fix has been made to work with UTF-8 system default encoding. (cf. https://bugs.openjdk.java.net/browse/JDK-8272352)
20-05-2022

Response from the submitter: "native.encoding" does not solve our issue. JEP-400 states that we should check for System.getProperty("native.encoding")!=null And use Charset.defaultCharset() otherwise (jdk<17). But that does not work for several reasons: 1. On JDK 18 that does not allow to transfer all characters but only the subset that is supported in the codepage of the OS (see example). I kindly ask to add an option that allows to transfer all Unicode characters command line arguments in a file (preferably always UTF-8 encoded). 2. On JDK <18 native.encoding is not guaranteed to be null. The user can specify -Dnative.encoding=SOMETHING and crash(!) the application with it (see example). I kindly request you update the documentation how to get the System charset in JEP-400 for JDK <18. 3. On JDK <18 Charset.defaultCharset() does not return the System Charset – the user can specify -Dfile.encoding=UTF-8 which will lead to a wrong charset on windows (see example). I kindly request you update the documentation how to get the System charset in JEP-400 for JDK <18. 4. On JDK <18 I found Charset.forName(System.getProperty("sun.jnu.encoding")) does work (see https://github.com/openjdk/jdk/pull/8378/files#diff-f9ae2535b3ccc7376de825e468d3e5758c0f476436d6f326361eb57d103f7e5aR155) please document when it is appropriate to use sun.jnu.encoding over native.encoding.  A solution might be to specify that sun.jnu.encoding is guaranteed to be null or same as native.encoding for JDK>18, so that an appropriate algorithm to get the System charset would be like systemcharset= System.getProperty("sun.jnu.encoding") != null? System.getProperty("sun.jnu.encoding"): System.getProperty("native.encoding")
20-05-2022

Additional Information from submitter: ============================ I wrote a test Program It's not only a problem of JDK 18: 1. i can't pass characters that are not part of my codepage on windows (all java versions) 2. System.getProperty("native.encoding") can be overwritten on JDK <18 => not reliable 3. Charset.defaultCharset() can be overwritten on JDK <18 => not reliable 4. Charset.forName(System.getProperty("sun.jnu.encoding")) OK but internal only! package charset; import java.io.IOException; import java.lang.ProcessBuilder.Redirect; import java.net.URISyntaxException; import java.net.URL; import java.nio.charset.Charset; import java.nio.file.Files; import java.nio.file.Path; import java.util.stream.Collectors; public class Test { // starts new JVM and calls itself 2 times public static void main(String[] args) throws InterruptedException, IOException, URISyntaxException { final int OK = 2; final int ERROR = 3; final String payload = "\u00f6\u00fc\u20ac"; // nonAscii but OK on my computer // final String payload = "\u0080\u041e"; // => ERROR (my codepage does not contain that characters) System.out.println("payload:" + payload+" "+payload.chars().mapToObj(c->String.format("\\u%04x",c)).collect(Collectors.joining())); Class<Test> clazz = Test.class; System.out.println("java.version:" + System.getProperty("java.version")); Boolean wired = Boolean.getBoolean("wired.properties"); String payloadRead = System.getProperty("payload"); if (wired ^ payloadRead != null) { System.out.println("payloadRead:" + payloadRead+" "+payload.chars().mapToObj(c->String.format("\\u%04x",c)).collect(Collectors.joining())); System.exit(payload.equals(payloadRead) ? OK : ERROR); return; } URL url = clazz.getClassLoader().getResource("charset/Test.class"); Path argFile = Files.createTempFile("prefix", "suffix"); Charset wiredCharset = java.nio.charset.StandardCharsets.UTF_16LE; Files.writeString(argFile, "-Dpayload=" + payload, getSystemCharset()); String path = Path.of(url.toURI()).getParent().toString() + "/..".repeat(1 + (int) clazz.getPackage().getName().chars().filter(ch -> ch == '.').count()); ProcessBuilder processBuilder = new ProcessBuilder("java", // "-cp", // path, // "@" + argFile, // "-Dfile.encoding=" + wiredCharset, // "-Dnative.encoding=" + wiredCharset, // ERROR will crash Java <18 "Unrecognized option: -" "-Dsun.jnu.encoding=" + wiredCharset, // "-Dwired.properties="+!wired, // clazz.getName()); System.out.println(processBuilder.command().stream().collect(Collectors.joining(" "))); processBuilder.redirectOutput(Redirect.INHERIT); processBuilder.redirectError(Redirect.INHERIT); processBuilder.redirectInput(Redirect.INHERIT); Process process = processBuilder.start(); int exitValue = process.waitFor(); System.out.println(exitValue == OK ? "OK" : "ERROR"); System.exit(exitValue); } private static Charset getSystemCharset() { // tell me a stable Charset name, please, that works on all Java Versions: String nativeEncoding = System.getProperty("native.encoding"); // ERROR for JDK<18 if (nativeEncoding == null) return Charset.forName(System.getProperty("sun.jnu.encoding")); // OK but internal!! //return Charset.defaultCharset(); // =>ERROR (uses file.encoding) return Charset.forName(nativeEncoding); // OK } }
20-05-2022

Adviced the submitter the above comments.
20-05-2022

As Alan mentioned, I believe reading the command file using `native.encoding` will suffice. Asking the submitter whether that's the case.
19-05-2022

[~tongwan] I think the submitter is looking for the system property "native.encoding", documented in JEP-400 and in the table of standard system properties in java.lang.System. See also JDK-8283620 where Eclipse may also need to be aware of the system property for the standard streams.
19-05-2022

From JDK-8260265, it looks like a regression introduced in JDK 18b13.
19-05-2022