JDK-8355357 : Add standard system property stdin.encoding
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.lang
  • Priority: P4
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 25
  • Submitted: 2025-04-23
  • Updated: 2025-04-25
  • Resolved: 2025-04-25
Related Reports
CSR :  
Description
Summary
-------

Add and specify a new `stdin.encoding` system property that
recommends an encoding that applications should use when
reading character data from the standard input.

Problem
-------
The JDK is missing a means to recommend a character encoding
that applications and libraries should use when reading from the
standard input. The existing encoding properties are insufficient
for this. The `file.encoding` property is the default charset and is
usually UTF-8 per [JEP 400][1]. The `native.encoding` property
specifies the user's or system's preferred encoding. However, when
standard input is connected to a console, it is possible that the
console has an encoding that differs from both of these. The
current recommendation (introduced in JDK 17 by [JDK-8264209][2])
is for applications to use `Console.charset()`. However, this is
correct only when the standard input is connected to a console
and a `Console` object is available. It is unclear what applications
are expected to do in other cases.

A new property is thus warranted that provides an encoding
recommendation for whatever the standard input is connected
to. In addition, the forthcoming [JEP 512][3] proposes a new
method `IO.readln` which reads from standard input. This method
needs a standardized way to establish what encoding it should use.

Solution
--------

Create and specify a new system property `stdin.encoding`. It is broadly
similar to the existing `stdout.encoding` and `stderr.encoding` properties.
It is set by platform-specific code to a value that indicates the encoding
that is appropriate to use for whatever the standard input is connected to.
The exact behavior is platform-specific.

It is possible to set the property to UTF-8 on the command line to override
the system's chosen value. The result of using other values is unspecified.
Other property specifications' wording is adjusted to use "unspecified" instead
of "undefined" to bring them into alignment.

The specification for `System.in` is modified to include a recommendation
for applications to use the `stdin.encoding` property to determine the encoding
to use. Additional specification text warns against mixing byte input with
character input.

Clarifies that that `native.encoding` system property cannot be overridden
on the command line.

Specification
-------------

The specification for `System.in` is modified as follows:

         /**
          * The "standard" input stream. This stream is already
    -     * open and ready to supply input data. Typically this stream
    +     * open and ready to supply input data. This stream
          * corresponds to keyboard input or another input source specified by
    -     * the host environment or user. In case this stream is wrapped
    -     * in a {@link java.io.InputStreamReader}, {@link Console#charset()}
    -     * should be used for the charset, or consider using
    -     * {@link Console#reader()}.
    +     * the host environment or user. Applications should use the encoding
    +     * specified by the {@link ##stdin.encoding stdin.encoding} property
    +     * to convert input bytes to character data.
          *
    -     * @see Console#charset()
    -     * @see Console#reader()
    +     * @apiNote
    +     * The typical approach to read character data is to wrap {@code System.in}
    +     * within an {@link java.io.InputStreamReader InputStreamReader} or other object
    +     * that handles character encoding. After this is done, subsequent reading should
    +     * use only the wrapper object; operating directly on {@code System.in} results
    +     * in unspecified behavior.
    +     * <p>
    +     * For handling interactive input, consider using {@link Console}.
    +     *
    +     * @see Console
    +     * @see ##stdin.encoding stdin.encoding
          */


The table of system properties in the `System.getProperties` method specification is modified
as follows:

          * <tr><th scope="row">{@systemProperty user.dir}</th>
          *     <td>User's current working directory</td></tr>
          * <tr><th scope="row">{@systemProperty native.encoding}</th>
    -     *     <td>Character encoding name derived from the host environment and/or
    -     *     the user's settings. Setting this system property has no effect.</td></tr>
    +     *     <td>Character encoding name derived from the host environment and
    +     *     the user's settings. Setting this system property on the command line
    +     *     has no effect.</td></tr>
    +     * <tr><th scope="row">{@systemProperty stdin.encoding}</th>
    +     *     <td>Character encoding name for {@link System#in System.in}.
    +     *     The Java runtime can be started with the system property set to {@code UTF-8}.
    +     *     Starting it with the property set to another value results in unspecified behavior.
          * <tr><th scope="row">{@systemProperty stdout.encoding}</th>
          *     <td>Character encoding name for {@link System#out System.out} and
          *     {@link System#console() System.console()}.
    -     *     The Java runtime can be started with the system property set to {@code UTF-8},
    -     *     starting it with the property set to another value leads to undefined behavior.
    +     *     The Java runtime can be started with the system property set to {@code UTF-8}.
    +     *     Starting it with the property set to another value results in unspecified behavior.
          * <tr><th scope="row">{@systemProperty stderr.encoding}</th>
          *     <td>Character encoding name for {@link System#err System.err}.
    -     *     The Java runtime can be started with the system property set to {@code UTF-8},
    -     *     starting it with the property set to another value leads to undefined behavior.
    +     *     The Java runtime can be started with the system property set to {@code UTF-8}.
    +     *     Starting it with the property set to another value results in unspecified behavior.
          * </tbody>
          * </table>
          * <p>

And further down in the same table:

          * <tr><th scope="row">{@systemProperty file.encoding}</th>
          *     <td>The name of the default charset, defaults to {@code UTF-8}.
          *     The property may be set on the command line to the value
          *     {@code UTF-8} or {@code COMPAT}. If set on the command line to
          *     the value {@code COMPAT} then the value is replaced with the
          *     value of the {@code native.encoding} property during startup.
          *     Setting the property to a value other than {@code UTF-8} or
    -     *     {@code COMPAT} leads to unspecified behavior.
    +     *     {@code COMPAT} results in unspecified behavior.
          *     </td></tr>
          * </tbody>
          * </table>


[1]: https://openjdk.org/jeps/400

[2]: https://bugs.openjdk.org/browse/JDK-8264209

[3]: https://openjdk.org/jeps/512
Comments
Moving to Approved.
25-04-2025