JDK-6727466 : java.exe/JRE1.6.0_10-b25/b27 doesn't seem to handle international characters
  • Type: Bug
  • Component: tools
  • Sub-Component: launcher
  • Affected Version: 6u10
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2008-07-18
  • Updated: 2015-06-17
  • Resolved: 2010-08-26
Related Reports
Duplicate :  
Relates :  
Description
FULL PRODUCT VERSION :
java version "1.6.0_10-rc"
Java(TM) SE Runtime Environment (build 1.6.0_10-rc-b27)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b14, mixed mode)

ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows [Version 5.2.3790]
(Windows XP x64 Edition, all service packs and updates applied)

Microsoft Windows XP [Version 5.1.2600]


EXTRA RELEVANT SYSTEM CONFIGURATION :
Intel Q6600 CPU

A DESCRIPTION OF THE PROBLEM :
When I try to feed java.exe or javaw.exe in JDK/JRE 1.6.0_10-b25 or b27 an argument containing international characters such as chinese, they turn up as question marks when the arguments are passed to my program's main function.
I have written a small program to demonstrate, which can be downloaded from http://hem.bredband.net/unsound/temp/TestInternationalArgs.java .

When I for instance execute this command in Windows' "Run"-menu (Windows-R on the keyboard)

java -cp "C:\Temp" TestInternationalArgs "C:\Temp\<some international character>\file.txt"

(assuming TestInternationalArgs.class is present in C:\Temp)
I get the following output in my demonstration program:

-----
Arguments passed to program:
  args[0]: "C:\Temp\??\file.txt"

Argument strings as hexadecimal representations of UTF-16 values:
  args[0]: 'C' (0x43) ':' (0x3a) '\' (0x5c) 'T' (0x54) 'e' (0x65) 'm' (0x6d) 'p' (0x70) '\' (0x5c) '?' (0x3f) '?' (0x3f) '\' (0x5c) 'f' (0x66) 'i' (0x69) 'l' (0x6c) 'e' (0x65) '.' (0x2e) 't' (0x74) 'x' (0x78) 't' (0x74)
-----

The limitation seems to be within the java.exe executable. Creating a Java VM manually with JNI and invoking the main class with arguments containing international characters works fine (obviously, since you have a lot more control that way).
I'm running Windows XP x64 when testing this, and tested both the x64 version and the x86 version of JDK 1.6.0_10-b25 (they behaved the same way). I also tested on another computer with regular 32-bit Windows XP installed and with b25, and the result was the same.

(According the OpenJDK source code (openjdk-6-src-b10_30_may_2008)), it seems that you, in java.c, function main and JavaMain, treat argc and 
argv as multi byte character strings encoded in the system OEM encoding 
(which is usually Cp1252). This is not the way to do it in Windows 
nowadays... you get the command line with GetCommandLineW ( 
http://msdn.microsoft.com/en-us/library/ms683156(VS.85).aspx ) instead 
of using argc and argv and then convert it into argv-style wchar_t 
strings using CommandLineToArgvW ( 
http://msdn.microsoft.com/en-us/library/bb776391.aspx ) in order to 
properly get a unicode (UTF-16) argument array. This is how I do it in 
my custom launcher.
These functions are present in Windows 2000 and later. I'm not sure if 
you're currently supporting NT4 / 9x... in that case you would need to 
do some conditional function calls.

Either way, may I suggest that you add a function such as 
GetCommandLineAsUTF8(int argc, char** argv) in java_md.h and implement 
this for the different platforms? This is a very machine dependent 
operation, so I think it belongs there.



REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import javax.swing.*;

/** Small program which pops up a JTextPane showing the argument list passed to main in detail. */
public class TestInternationalArgs {
    public static void main(final String[] args) {
	JTextArea jta = new JTextArea(50, 80);
	jta.setLineWrap(true);
	JScrollPane jtaScroller = new JScrollPane(jta);
	JFrame jf = new JFrame("TestInternationalArgs");
	jf.add(jtaScroller);
	jf.pack();
	jf.setLocationRelativeTo(null);
	jf.setVisible(true);
	jf.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
	
	/* Print args to text pane to ensure international characters
	 * are displayed correctly. */
	jta.append("Arguments passed to program:\n");
	for(int i = 0; i < args.length; ++i) {
	    String cur = args[i];
	    jta.append("  args[" + i + "]: \"" + cur + "\"\n");
	}
	jta.append("\n");
	jta.append("Argument strings as hexadecimal representations of UTF-16 values:\n");
	for(int i = 0; i < args.length; ++i) {
	    char[] cur = args[i].toCharArray();
	    jta.append("  args[" + i + "]:");
	    for(char c : cur) {
		jta.append(" '" + c + "' (0x" + Integer.toHexString(c) + ")");
	    }
	    jta.append("\n");
	}
    }
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Write your own java launcher which looks up jvm.dll, creates the VM, finds the main class and executes it.