JDK-8356245 : stdin.encoding and stdout.encoding in jshell don't respect console code pages
  • Type: Bug
  • Component: tools
  • Sub-Component: jshell
  • Affected Version: 18,25
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: windows_10
  • CPU: generic
  • Submitted: 2025-05-03
  • Updated: 2025-05-29
  • Resolved: 2025-05-27
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 25
25 b25Fixed
Related Reports
Relates :  
Description
ADDITIONAL SYSTEM INFORMATION :
openjdk 25-ea 2025-09-16
OpenJDK Runtime Environment (build 25-ea+21-2530)
OpenJDK 64-Bit Server VM (build 25-ea+21-2530, mixed mode, sharing)

Windows 11 24H2

(Note: stdin.encoding requires the latest EA build 21)

A DESCRIPTION OF THE PROBLEM :
Terminals in Windows can change their I/O code pages from the OEM code page by chcp (CMD) or changing the Console.InputEncoding or Console.OutputEncoding properies (in PowerShell/C#).
By this you can receive or send any Unicode characters from/to external programs.
The normal java command can recognize such code pages and adjust such properties, but JShell doesn't.
JShell cannot recognize the change. Such properties are fixed to the string of the OEM code page there.
This cause character corruption in statements and expressions using System.in, System.out, and IO (Java 25+).

The current code pages can be obtained by GetConsoleCP or GetConsoleOutputCp functions in Windows API.

https://learn.microsoft.com/en-us/windows/console/getconsoleoutputcp
https://learn.microsoft.com/en-us/windows/console/getconsolecp

TL; DR: Test Case Code does not cause corruption in java command but does in JShell once you change console code pages.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Run `chcp 65001` (CMD) or `[Console]::InputEncoding = [Console]::OutputEncoding = [System.Text.Encoding]::UTF8` (PowerShell)
2. Run `jshell` (e.g. `jdk-25\bin\jshell`)
3. Evaluate the following expressions there:

```java
System.getProperty("stdin.encoding")
System.getProperty("stdout.encoding")
```

4. Run the following commands there:

```java
System.out.println("πŸ‘");
System.out.println("√2 ÷ 2 = 1 / √2");
```

πŸ‘: Unicode dedicated character
√ & Γ·:  non-ASCII characters in OEM code page but not automatically converted to ASCII one (e.g. Β₯ in Japanese)

5: Run the following command and type some non-ASCII string there:

```
(new Scanner(System.in, System.getProperty("stdin.encoding"))).nextLine()
```

-----

Test Case Code can be run `java path\to\TestCaseCode.java` to prove that this bug does *not* exist in normal programs. Input string can be e.g. `√2 Γ· 2 = 1 / √2πŸ‘` in Shift_JIS / CP437.


EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
jshell> System.getProperty("stdin.encoding")
$1 ==> "UTF-8"

jshell> System.getProperty("stdout.encoding")
$2 ==> "UTF-8"

jshell> System.out.println("πŸ‘")
πŸ‘

jshell> System.out.println("√2 ÷ 2 = 1 / √2");
√2 ÷ 2 = 1 / √2

jshell> (new Scanner(System.in, System.getProperty("stdin.encoding"))).nextLine()
√2 ÷ 2 = 1 / √2
$5 ==> "√2 ÷ 2 = 1 / √2"
ACTUAL -
Note: an example in Japanese environment

jshell> System.getProperty("stdin.encoding")
$1 ==> "MS932"

jshell> System.getProperty("stdout.encoding")
$2 ==> "MS932"

jshell> System.out.println("πŸ‘")
?

jshell> System.out.println("√2 ÷ 2 = 1 / √2");
οΏ½οΏ½2 οΏ½οΏ½ 2 = 1 / οΏ½οΏ½2

jshell> (new Scanner(System.in, System.getProperty("stdin.encoding"))).nextLine()
√2 ÷ 2 = 1 / √2
$5 ==> "竏�2 οΎƒο½· 2 = 1 / 竏�2"

---------- BEGIN SOURCE ----------
// Correctly outputs as expected unlike in JShell even after you change code pages to UTF-8:
//
// stdin: UTF-8
// stdout: UTF-8
// πŸ‘
// √2 ÷ 2 = 1 / √2
// Input: √2 Γ· 2 = 1 / √2πŸ‘
// √2 Γ· 2 = 1 / √2πŸ‘
import java.util.Scanner;

class Test {
    public static void main(String[] args) {
        System.out.println("stdin: " + System.getProperty("stdin.encoding"));
        System.out.println("stdout: " + System.getProperty("stdout.encoding"));
        System.out.println("πŸ‘");
        System.out.println("√2 ÷ 2 = 1 / √2");
        System.err.print("Input: ");
        System.out.println((new Scanner(System.in, System.getProperty("stdin.encoding"))).nextLine());
    }
}
---------- END SOURCE ----------


Comments
Changeset: 9c191cc0 Branch: master Author: Jan Lahoda <jlahoda@openjdk.org> Date: 2025-05-27 09:49:26 +0000 URL: https://git.openjdk.org/jdk/commit/9c191cc0fad4e2cd8ac021082acc494dc7503745
27-05-2025

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/25328 Date: 2025-05-20 13:28:27 +0000
20-05-2025

I think the cause of this issue is the jshell is spawning the agent VM for its REPL process. Since the child agent VM is not attached to actual terminal, its charset for standard i/o falls back to `native.encoding` (in this problem case, MS932). This can be worked around by supplying the `stdout.encoding` to the remote process, i.e, ``` jshell -R-Dstdout.encoding=UTF-8 ``` This should print those non-ASCII characters correctly (if the console is set to `chcp 65001`) So, I think jshell can be modified to supply `-Dstdout.encoding=(the one in jshell's own process)` on spawning the agent VM to solve the issue.
09-05-2025

Seems to be related to: https://bugs.openjdk.org/browse/JDK-8274784 in which "Charset.defaultCharset()" was replaced with "System.out.charset()"
08-05-2025

``` >chcp 65001 Active code page: 65001 >java Test.java stdin: UTF-8 stdout: UTF-8 πŸ‘ √2 Γ· 2 = 1 / √2 Input: οΏ½οΏ½β˜•βœ…οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½β–ΆοΈβž‘οΈ πŸ‘β˜•βœ…πŸ§©πŸŽΆπŸ˜ˆβ–ΆοΈβž‘οΈ ``` Seems like the `java`-based test using a too simple `Scanner` line reader has issues displaying all glyphes as expected, but only when reading them in. The output of the final System.out::println is correct.
08-05-2025

Impact -> H (Regression) Likelihood -> L (Uncommon uses) Workaround -> M (Somewhere in-between the extremes) Priority -> P3
06-05-2025

The observations on Windows 11: JDK 18ea+25: Passed. JDK 18ea+26: Failed, incorrect outputs observed. JDK 25ea+6: Failed.
06-05-2025