JDK-8283620 : System.out does not use the encoding/charset specified in the Javadoc
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 18,19
  • Priority: P4
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2022-03-22
  • Updated: 2022-10-07
  • Resolved: 2022-04-26
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 19
19 b20Fixed
Related Reports
CSR :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8285492 :  
Description
ADDITIONAL SYSTEM INFORMATION :
Windows 11 10.0.22000

openjdk version "18" 2022-03-22
OpenJDK Runtime Environment (build 18+36-2087)
OpenJDK 64-Bit Server VM (build 18+36-2087, mixed mode, sharing)

A DESCRIPTION OF THE PROBLEM :
System.out's Javadoc states the following:
The encoding used in the conversion from characters to bytes is equivalent to Console.charset() if the Console exists, Charset.defaultCharset() otherwise.

When there is a Console, this is correct. However, when there isn't a Console, e.g. when redirecting output to a file, System.out now (in JDK 18) uses `native.encoding` rather than the result of calling Charset.defaultCharset(), which is affected by `file.encoding`. You used to be able to control the output of a program in prior JDKs using `file.encoding` because the semantics stated by the Javadoc were correct. Now, you cannot set `native.encoding`, and `sun.stdout.encoding` is an undocumented feature, so it cannot be officially changed any more.

In my opinion, the correct fix is to use `native.encoding` only when `file.encoding` is not specified, which retains the output behavior of JDK 17 and below regardless of if `file.encoding` is specified, and update the Javadoc to reflect this.

I am willing to make a PR to fix this whichever way is preferred.


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Reproduction steps made on Linux, but can be adapted to other OSes:
1. Compile the source code attached.
2. Run `java --add-opens=java.base/java.io=ALL-UNNAMED Test >test.txt`
3. Inspect test.txt to see it states the following (Windows shows a different System.out):
console: null
'default' charset: UTF-8
System.out: UTF-8
4. Try changing the 'default' charset and therefore what should be used by System.out according to the Javadoc. Run `java -Dfile.encoding=Cp1252 --add-opens=java.base/java.io=ALL-UNNAMED Test >test.txt`:
console: null
'default' charset: windows-1252
System.out: UTF-8

6. Notice how the System.out does not change, despite the default charset change.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
System.out should change with file.encoding when a Console is not present, as documented by the Javadoc.
ACTUAL -
See reproduction steps, especially #6.

---------- BEGIN SOURCE ----------
public class Test {
  public static void main(String[] args) throws Throwable {
    System.out.println("console: " + System.console()); // Show if the console was present
    if (System.console() != null) System.out.println("console charset: " + System.console().charset()); // Show the console's charset
    System.out.println("'default' charset: " + java.nio.charset.Charset.defaultCharset()); // Show the "default" charset
    var charsetField = System.out.getClass().getDeclaredField("charset");
    charsetField.setAccessible(true);
    System.out.println("System.out: " + charsetField.get(System.out)); // Show the charset used by System.out
  }
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Use sun.stdout.encoding, an undocumented and unsupported property.

FREQUENCY : always



Comments
Changeset: 03bcf7b6 Author: Naoto Sato <naoto@openjdk.org> Date: 2022-04-26 16:05:20 +0000 URL: https://git.openjdk.java.net/jdk/commit/03bcf7b6d196f6c5d851059cb6f580767eee4e94
26-04-2022

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk/pull/8270 Date: 2022-04-15 20:26:55 +0000
21-04-2022

Yes, that's what I am proposing here. Please take a look at the CSR linked in this issue. With the current prototype, you get the following. I believe that's what you are requesting: --- C:\Users\nsato>d:\projects\jdk\git\jdk\build\windows-x64\jdk\bin\jshell JAVASE | Welcome to JShell -- Version 19-internal | For an introduction type: /help intro jshell> System.out.charset() $177 ==> windows-1252 jshell> /exit | Goodbye C:\Users\nsato>d:\projects\jdk\git\jdk\build\windows-x64\jdk\bin\jshell -R-Dstdout.encoding=UTF-8 JAVASE | Welcome to JShell -- Version 19-internal | For an introduction type: /help intro jshell> System.out.charset() $177 ==> UTF-8 jshell>
13-04-2022

Additional information from the submitter: I've been tracking this issue, and I realize I may have caused some confusion -- the main behavior issue we have is with Windows 10 + `file.encoding=UTF-8`. This previously allowed us to capture UTF-8 output on Windows 10, for e.g. test suites, but it now produces cp1252 output, which isn't usable for our purposes. I would much appreciate it if sun.stdout/err.encoding were promoted to public properties, as a proper way to control this behavior. Otherwise, there doesn't seem to be a supported way to get UTF-8 output on Windows 10.
13-04-2022

In fact, the spec for `System.out/err` falling back to `native.encoding` was intentional (L2124-2126): https://github.com/openjdk/jdk/pull/4733/files#diff-bd92d760986b9249dd3c02cc147db4f7e9dbbef90afef4971ee497a50e48c740 The spec for `System.out/err` should be corrected to reflect the JEP 400 change. Having said that, promoting `sun.stdout/err.encoding` to public properties may mitigate this situation.
07-04-2022

I second Alan's comment. As submitter points out `sun.stdout.encoding` can affect the System.out encoding, but that is equally unsupported as `file.encoding` with non UTF-8/COMPAT value. Unrelated, but in JDK18, we provided `PrintStream.charset()`, so the test case does not have to reflectively get the value of `System.out.charset` with `--add-opens` option. It can simply call `System.out.charset()`.
29-03-2022

The file.encoding property was a "read-only" property in older release; it was never supported to run with -Dfile.encoding=Cp1252. In JDK 18, the spec is clear that setting file.encoding to a value other than "UTF-8" or "COMPAT" leads to unspecified behavior. We may need to look at these scenarios again, or at least see how to get some of these usages onto a supported footing.
29-03-2022

The observations on Windows 10: JDK 17: NoSuchFieldException: charset JDK 18ea+29: Failed, System.out: x-windows-950 for default and file.encoding=Cp1252 JDK 19ea+3: Failed.
24-03-2022

Additional information from the submitter: Due to the nature of it, it doesn't work on JDKs earlier than 18.
24-03-2022

Observed the following exception on Windows 10: Exception in thread "main" java.lang.NoSuchFieldException: charset at java.base/java.lang.Class.getDeclaredField(Class.java:2610) at Test.main(Test.java:6) Requested more information from the submitter about the above output.
22-03-2022