JDK-8356149 : InputStreamReader doesn't use stdin.encoding for System.in
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • OS: windows
  • CPU: generic
  • Submitted: 2025-05-03
  • Updated: 2025-05-06
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Causes :  
Description
ADDITIONAL SYSTEM INFORMATION :
openjdk 25-ea 2025-09-16
OpenJDK Runtime Environment (build 25-ea+21-2530)
OpenJDK 64-Bit Server VM (build 25-ea+21-2530, mixed mode, sharing)

Windows 11 24H2

Note: stdin.encoding is available only in the latest EA build.

-----

Java 17:

openjdk 17.0.15 2025-04-15
OpenJDK Runtime Environment Temurin-17.0.15+6 (build 17.0.15+6)
OpenJDK 64-Bit Server VM Temurin-17.0.15+6 (build 17.0.15+6, mixed mode, sharing)

A DESCRIPTION OF THE PROBLEM :
Before JEP-400 (Java 18), InputStreamReader (and Scanner) can read characters from stdin without corruption without the 2nd encoding argument in Japanese (and some other languages where ANSI/OEM encodings are same) environment, e.g. new InputStreamReader(System.in)`.
However, due to JEP-400, all Windows users have to pass the 2nd argument to InputStreamReader or Scanner, e.g. `new InputStreamReader(System.in, System.getProperty("stdout.encoding"))`, or change the console code page to UTF-8 by e.g. `chcp 65001` to prevent input character corruption.
We can now start to fix this problem because stdin.encoding is available today thanks to JDK-8350703 / JDK-8355357.

I set Java 17 as "the prior release in which it worked for you" because some languages including Japanese has a single code page shared by both ANSI and OEM code pages. In US English environment the echoed strings probably will be still corrupted because ANSI and OEM code pages are different there.

The main description on `new InputStreamReader(InputStream)` only says "Creates an InputStreamReader that uses the default charset.". Although "Charset.defaultCharset()" is in its "See Also", but "default charset" still can be interpreted as the preferred charset by each InputStream instance. This is why this report was reported as "Bug".

The fixing strategy candidate:

- Detect whether InputStream is System.in in the constructor `InputStreamReader(InputStream)`; e.g. https://github.com/tats-u/jdk/commit/c08efafbfb3f0056c600ec2763f583f99f7dba28
- Add an instance method to provide the default preferred charset to InputStream; the base one is something like `Charset getPreferredCharset() { return Charset.default; }` and one for System.in is `Charset getPreferredCharset() { return Charset.forName(System.getProperty("stdin.encoding", ""), sun.nio.cs.UTF_8.INSTANCE); }`.

REGRESSION : Last worked in version 17

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Run `chcp 932` (you have set chcp 65001 before in CMD) or `[Console]::InputEncoding = [Console]::OutputEncoding = [System.Text.Encoding]::GetEncoding(932)` (PowerShell) if you have changed the console encoding to UTF-8 by the similar way (932 is an example for Japanese. Change it back to that representing the OEM code page in your Windows language)

437 instead of 937 can probably produce this bug too, but the number of ? and encoding name will be different.
Expected Result and Actual Result is for Japanese environment where 932 is passed. (positive reproduction for Java 18+)

2. Run `java path\to\TestCaseCode.java`
3. Input a string including non-ASCII characters and Enter
4. Do 3. once more
5. Compare the string input by 3. and 4. with the output

Note: before you confirm this bug is not reproduced in Java 17 or prior, you need to switch the Windows language to e.g. Japanese, Chinese, Korean (or Thai) in advance. The "Expected Result" is taken in Java 17 in Japanese environment with code page 932.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Scanner: √2 ÷ 2 = 1 / √2
√2 ÷ 2 = 1 / √2
BufferedReader + InputStreamReader: √2 ÷ 2 = 1 / √2
√2 ÷ 2 = 1 / √2
ACTUAL -
Scanner: √2 ÷ 2 = 1 / √2
??2 ?? 2 = 1 / ??2
BufferedReader + InputStreamReader: √2 ÷ 2 = 1 / √2
??2 ?? 2 = 1 / ??2

---------- BEGIN SOURCE ----------
import java.util.Scanner;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.IOException;

class Test {
    public static void main(String[] args) {
        System.err.print("Scanner: ");
        System.out.println((new Scanner(System.in)).nextLine());
        System.err.print("BufferedReader + InputStreamReader: ");
        try {
            System.out.println((new BufferedReader(new InputStreamReader(System.in))).readLine());
        } catch (IOException e) {
            e.printStackTrace(System.err);
        }
    }
}
---------- END SOURCE ----------


Comments
The change in the JEP 400 was the intended one. We deliberately retained the System.in to use the console encoding for (other) compatibility reasons, as switching it to UTF-8 was too disruptive. On Windows side, they seem to switch to their default system encoding to UTF-8 (there is a beta option in the settings), that would set the console encoding to 65001 (UTF-8). This was also one of the factors we decided to keep the System.in encoding. Windows still seems to have that option as "beta" though. BTW, I am not quite sure what the "preferred" means in this context? Who prefers what?
05-05-2025

I've changed this issue to be an Enhancement because it's not a bug.
05-05-2025

Impact -> H (Regression) Likelihood -> L (Uncommon uses) Workaround -> M (Somewhere in-between the extremes) Priority -> P3
05-05-2025

The observations on Windows 11: JDK 17: Passed. The outputs are correct. JDK 18ea+12: Passed. JDK 18ea+13: Failed, The outputs are incorrect. JDK 25ea+6: Failed.
05-05-2025

The advice in the API docs is to use Console.charset() so the above code would be `new InputStreamReader(System.in, console.charset())` rather than new InputStreamReader(System.in). Further work on stdin.encoding will update the API docs to recommend using stdin.encoding. Having InputStreamReader special case System.in, which I think is what is being proposed by the submitter, will lead to surprising behavior once the input stream is wrapped.
04-05-2025