Bug ID: JDK-8029584 Allow \uxxxx unicode-escaping on the jvm command-line arguments

Type: Enhancement
Component: tools
Sub-Component: launcher
Affected Version: 7u6,15,16

Priority: P3
Status: Open
Resolution: Unresolved

Submitted: 2013-12-05
Updated: 2020-11-11

Other
tbdUnresolved

Enhancement request from a customer. 

When launching a Java Application under Windows using the Tanuki  Wrapper, it
is impossible to properly send Unicode Characters to the  command-line,
perhaps at all, perhaps without tightly restricting the System Encoding
configuration.  It would really help, if we could Unicode-escape \uXXXX
characters on the command-line and then add a JVM argument to indicate this
was done. This would allow passing any Unicode character, even if only ASCII
is available on the command-line.

It appears that the Tanuki codebase uses the proper Win32 Unicode magic.

It appears that the JVM command-line arguments under Windows are parsed in
hotspot.src.os.windows.launcher.java_md.c  and that it would be simple to
pre-parse the command-line to handle Unicode in this way.

The IBM JVM has this feature.  See

http://publib.boulder.ibm.com/infocenter/javasdk/v5r0/index.jsp?topic=%2Fcom.ibm.java.doc.user.aix64.50%2Fuser%2Fglobalization.html

for Example.

We have received a patch from IBM for the launcher. This patch was created 6/2009 so has been in their code base for a while now. Porting it to JDK9DEV modular file structure.
27-08-2014
As an FYI this was closed as WNF, JDK-4858889
20-02-2014
The original filer of this bug requested to support the same feature as IBM's -Xargencoding. This can be done completely inside the launcher and does not involve hotspot. In fact, hotspot is not involved in the processing of mainClass and appArgs. Transfer (hotspot,runtime) => (tools,launcher).
18-02-2014
A compromise could be to unescape \uxxxx in ParseArguments in java.c, as long as the default platform encoding supports the character; it would become '?' if the default platform encoding does not support the character. On most platforms today, the default platform encoding is some sort of UTF8, so almost everyone would be happy. Unfortunately, the only people who would be unhappy, (and arguably the people who would otherwise benefit the most from this feature), would be those who use some sort of non-unicode-compatible encoding (like the original filer of this bug) ....
18-02-2014
However, implementing the unescaping for mainClass and appArgs are simpler (see below), but there are two major problems with implementing it for vmArgs: [1] The specification of JNI_CreateJavaVM(JavaVM pvm, void penv, void args) http://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/invocation.html says args is of type JavaVMInitArgs typedef struct JavaVMInitArgs { jint version; jint nOptions; JavaVMOption options; jboolean ignoreUnrecognized; } JavaVMInitArgs; The version field must be set to JNI_VERSION_1_2. (In contrast, the version field in JDK1_1InitArgs must be set to JNI_VERSION_1_1.) The options field is an array of the following type: typedef struct JavaVMOption { char optionString; / the option as a string in the default platform encoding / void extraInfo; } JavaVMOption; Because optionString is in "default platform encoding", it may not be able to pass certain unicode characters (i.e., if the platform encoding is iso8859-1). We need to update the spec to change the type of optionString to UTF8. [2] Currently, the conversion of "default platform encoding" -> java.lang.String is done via an JNI upcall (java.c NewPlatformString() -> sun/launcher/LauncherHelper.makePlatformString). However, if we want to support unescaping of vmArgs, then the conversion must be done BEFORE the JVM is launched. This means we need to have TWO SETS of encoding conversion code in the VM :-( BTW, I am not sure if "default platform encoding" can be overridden by the command-line. Is it controlled via -Dfile.encoding? E.g., what happens if you do: export LANG=en_US.ascii java -Dmy.prop='\u1234' -Dfile.encoding=UTF8 -Dmy.other.prop='\u5678' .... Should my.prop become "?" or "\u1234" ------ So, if we allow unescaping only for mainClass and appArgs, we can do it with the JVM already running, at which point calling NewPlatformString is possible. But then the inability to specify -Dmy.prop='\u1234' leaves something to be desired.
18-02-2014
We need to better specify what we want. For the command line java <vmArgs ...> <mainClass> <appArgs ...> We probably want to be able to use \uxxxx encoding in all places, including vmArgs. This way, you can do java -Dsome.property="ABC\u1234def" ... or even java -X\u0058+Verbose -v\u0065rsion => java -XX+Verbose -version Allowing escaping at arbitrary places seems weird, but the uniformity should simplify the implementation (and specification).
18-02-2014
Command-line parsing is handled in part by the launcher: jdk/src/windows/bin/java_md.c and in part by hotspot hotspot/src/share/vm/runtime/arguments.cpp I'm not certain where this escaping would need to be implemented, but I suspect in the launcher.
18-02-2014

Blocks :	JDK-8205991 - Cannot start application (WinLauncher) if path contains non-ascii character
Duplicate :	JDK-8221508 - Cannot run java.exe from folder if path contains non-ASCII character
Relates :	JDK-8124977 - cmdline encoding challenges on Windows
Relates :	JDK-8221508 - Cannot run java.exe from folder if path contains non-ASCII character
Relates :	JDK-4858889 - JNI Invocation API strings not UTF-8