JDK-8272352 : Java launcher can not parse Chinese character when system locale is set to UTF-8
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util:i18n
  • Affected Version: 17,18
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: windows
  • CPU: generic
  • Submitted: 2021-08-12
  • Updated: 2022-09-09
  • Resolved: 2022-05-05
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 17 JDK 19
11.0.17Fixed 17.0.5Fixed 19 b22Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Description
Create on behalf of Glavo <zjx001202@gmail.com>
----

When we turned on "Use Unicode UTF-8 for worldwide language support"(something like https://stackoverflow.com/questions/56419639/what-does-beta-use-unicode-utf-8-for-worldwide-language-support-actually-do) option, java default launcher can not parse arguments that containing Chinese characters:

java Foo 你好世界

String[]args are actually garbled Chinese characters.

Comments
Backport of JDK-8272352: Java launcher can not parse Chinese character when system locale is set to UTF-8. Does not apply cleanly to 11u due to a minor change for UTF-8 support otherwise included in JDK-8264208: Console charset API, which has not been backported to 11u. (I added case 65001: for UTF-8 on line 83). tier1 tests pass (github actions) fix confirmed locally (as it was in 17u) as there isnt a specific test
18-07-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk11u-dev/pull/1234 Date: 2022-07-16 00:38:20 +0000
16-07-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk11u-dev/pull/1228 Date: 2022-07-14 19:27:47 +0000
14-07-2022

Fix request [posted on behalf of Stephanie Crater] Backport to allow java to correctly parse Chinese characters in file paths and string arguments passed to java.exe. Java runtime has been detecting the Windows system locale encoding using GetLocaleInfo(GetSystemDefaultLCID(), LOCALE_IDEFAULTANSICODEPAGE, ...), but it returns the legacy ANSI code page value, e.g, 1252 for US-English. In order to detect whether the user has selected UTF-8 as the default, the code page has to be queried with GetACP(). Also, the case if the call to GetLocaleInfo fails changed to fall back to UTF-8 instead of Cp1252. Clean backport, low risk, confirmed the fix ran locally (note that there's not jtreg test as per original commit, due to requirements of windows config changes and reboot)
06-07-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk17u-dev/pull/530 Date: 2022-07-05 17:44:08 +0000
05-07-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk17u-dev/pull/522 Date: 2022-07-01 19:11:42 +0000
01-07-2022

Changeset: 22934485 Author: Naoto Sato <naoto@openjdk.org> Date: 2022-05-05 19:59:58 +0000 URL: https://git.openjdk.java.net/jdk/commit/229344853126692d38ff7cb164dd5d17c5bf7fbb
05-05-2022

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk/pull/8434 Date: 2022-04-27 20:23:32 +0000
27-04-2022

Relevant information on MS site (https://docs.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page)
27-04-2022

To be precise, GetACP() returns 65001 if the checkbox is checked, but GetLocaleInfo(GetSystemDefaultLCID(), LOCALE_IDEFAULTANSICODEPAGE, ...) returns 1252. The call should be replaced with GetACP().
27-04-2022

The issue here is that there is not any public document from MS wrt "Beta: Use Unicode UTF-8 for worldwide language support" check box. JDK is using the ANSI code page (CP_ACP) to translate the path/argument strings into Java's Strings, but GetACP() returns 1252, even if the check box is checked. It seems that it only affects the OEM code page (to 65001), thus this discrepancy. Would wait for the more clear definition of the functionality from MS (maybe Windows 11 will do?)
25-08-2021

The following piece will fix it, but it would also mean that all JNI related platform string would be affected: --- $ git diff diff --git a/src/java.base/share/classes/sun/launcher/LauncherHelper.java b/src/ java.base/share/classes/sun/launcher/LauncherHelper.java index 82b73d01c6b..985f33ce3c7 100644 --- a/src/java.base/share/classes/sun/launcher/LauncherHelper.java +++ b/src/java.base/share/classes/sun/launcher/LauncherHelper.java @@ -877,6 +877,7 @@ public final class LauncherHelper { } private static final String encprop = "sun.jnu.encoding"; + private static final String stdoutprop = "sun.stdout.encoding"; private static String encoding = null; private static boolean isCharsetSupported = false; @@ -887,7 +888,7 @@ public final class LauncherHelper { static String makePlatformString(boolean printToStderr, byte[] inArray) { initOutput(printToStderr); if (encoding == null) { - encoding = System.getProperty(encprop); + encoding = System.getProperty(stdoutprop, System.getProperty(encpro p)); isCharsetSupported = Charset.isSupported(encoding); } try { ---
24-08-2021

Looking into this issue, it is not that simple just to revert the change above. The issue here is that launcher is using the encoding from `sun.jnu.encoding`(= windows-1252) and the code tries to read it as `UTF-8`. Before the above fix, it happens to work because System.out's encoding is set to `windows-1252` (note that setting `cp65001` throws the exception for `setOut0()`, falling back to `windows-1252`) which just passthrough UTF-8 bytes for Ni-Hao.
19-08-2021

Looks like this is a regression caused by the fix to JDK-8266774. The following backout reverts the regression: --- diff --git a/src/java.base/windows/native/libjava/java_props_md.c b/src/java.base/windows/native/libjava/java_props_md.c index b3c16a453d7..754725264eb 100644 --- a/src/java.base/windows/native/libjava/java_props_md.c +++ b/src/java.base/windows/native/libjava/java_props_md.c @@ -147,8 +147,8 @@ static char* getConsoleEncoding() cp = GetConsoleCP(); if (cp >= 874 && cp <= 950) sprintf(buf, "ms%d", cp); - else if (cp == 65001) - sprintf(buf, "UTF-8"); +// else if (cp == 65001) +// sprintf(buf, "UTF-8"); else sprintf(buf, "cp%d", cp); return buf; ---
16-08-2021

I tried it again with my latest build, and it looks garbled. Will look into it.
16-08-2021

Can the submitter provide more information? I tried to reproduce the issue, however, I had exact the opposite result (which I expected), i.e., Chinese string cannot be displayed with non-UTF-8 command prompt (cp437, pic1), where they are displayed as four '?'s, while with the UTF-8 command prompt, they displayed correctly (pic2).
15-08-2021

Past report of the same problem: JDK-6584897 - Will Not Fix Open issues related to this: JDK-8124977 cmdline encoding challenges on Windows
13-08-2021

Moved to tools -> launcher.
13-08-2021