JDK-8284778 : System.out does not use the encoding/charset specified in the Javadoc
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.lang
  • Priority: P4
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 19
  • Submitted: 2022-04-12
  • Updated: 2022-04-25
  • Resolved: 2022-04-25
Related Reports
CSR :  
Description
Summary
-------

Add new system properties to set or get the encoding of the standard streams (`System.out` and `System.err`).

Problem
-------

After JEP 400 has been implemented, there are two issues exist wrt the encoding names of `System.out/err`, ie.

- In the javadoc, they fallback to `Charset.defaultCharset()` if there is no console, which in fact was not correct. It should have been "fallback to native.encoding".
- There used to be a way to override the encoding via a command-line option by setting `sun.stdout/err.encoding`, these properties were never documented.

Since the default encoding and default console encoding may differ after JEP 400, there should be some mitigation to override the default encoding for `System.out/err`.

Solution
--------

Add new system properties to set or get the encoding of the standard streams, it's equivalent to promoting the existing `sun.stdout/err.encoding` system properties to be standard properties with new names.

The default values to the properties are derived in a platform dependent way, or `native.encoding` if the platform does not provide streams for the console. The properties can be set on the launcher's command line option with `-D`. The only support value of the system properties is "UTF-8".

Specification
-------------

Change the field description of `java.lang.System#out` as:

           * specified by the host environment or user. The encoding used
           * in the conversion from characters to bytes is equivalent to
           * {@link Console#charset()} if the {@code Console} exists,
    -      * {@link Charset#defaultCharset()} otherwise.
    +      * <a href="#stdout.encoding">stdout.encoding</a> otherwise.
           * <p>
           * For simple stand-alone Java applications, a typical way to write
           * a line of output data is:
           * <blockquote><pre>
           *     System.out.println(data)
    @@ -153,11 +153,11 @@
           * @see     java.io.PrintStream#println(int)
           * @see     java.io.PrintStream#println(long)
           * @see     java.io.PrintStream#println(java.lang.Object)
           * @see     java.io.PrintStream#println(java.lang.String)
           * @see     Console#charset()
    -      * @see     Charset#defaultCharset()
    +      * @see     <a href="#stdout.encoding">stdout.encoding</a>
           */

Change the field description of `java.lang.System#err` as:

           * The encoding used in the conversion from characters to bytes is
           * equivalent to {@link Console#charset()} if the {@code Console}
    -      * exists, {@link Charset#defaultCharset()} otherwise.
    +      * exists, <a href="#stderr.encoding">stderr.encoding</a> otherwise.
           *
           * @see     Console#charset()
    -      * @see     Charset#defaultCharset()
    +      * @see     <a href="#stderr.encoding">stderr.encoding</a>
           */

Append the following two rows in `standard properties chart` in the method description of `System#getProperties()` method:

    +      * <tr><th scope="row">{@systemProperty stdout.encoding}</th>
    +      *     <td>Character encoding name for {@link System#out System.out}.
    +      *     The Java runtime can be started with the system property set to {@code UTF-8},
    +      *     starting it with the property set to another value leads to undefined behavior.
    +      * <tr><th scope="row">{@systemProperty stderr.encoding}</th>
    +      *     <td>Character encoding name for {@link System#err System.err}.
    +      *     The Java runtime can be started with the system property set to {@code UTF-8},
    +      *     starting it with the property set to another value leads to undefined behavior.
Comments
[~naoto], thank you for the additional information; move to Approved.
25-04-2022

command-line arguments have always been converted to Java's String using `sun.jnu.encoding` which remains undocumented. Thus no need to provide mitigation for this `input` side, in terms of the JEP 400 changes.
25-04-2022

Moving to Provisional, not Approved. What context determines how the contents of the command line arguments themselves are converted into characters to set the system properties?
25-04-2022