JDK-4838072 : Yen sign is not converted properly when using String.getBytes("Shift_JIS")
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Affected Version: 1.4.1
  • Priority: P3
  • Status: Closed
  • Resolution: Not an Issue
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2003-03-26
  • Updated: 2003-03-27
  • Resolved: 2003-03-27
Related Reports
Relates :  
Description
Name: nt126004			Date: 03/26/2003


FULL PRODUCT VERSION :
java version "1.4.1"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-b21)
Java HotSpot(TM) Client VM (build 1.4.1-b21, mixed mode)

FULL OS VERSION :
Microsoft Windows XP [Version 5.1.2600]

A DESCRIPTION OF THE PROBLEM :
The methods String.getBytes(String charsetName) and new String(byte[] bytes, String charsetName) should be complimentary (unless characters in the string are not defined in the specified charset).  For any String, you should be able to create a byte array with getBytes and then create a new String from the byte array such that the new String is equivalent to original String.
 
When the charsetName is "Shift_JIS" and the String contains a Yen character, the method String.getBytes returns a value of 0x5C.  This is correct behavior.  However, the method new String(bytes, charsetName) converts the byte back to a String containing the Reverse Solidus character instead of the Yen character.  This is not correct behavior.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
See the example program RoundTrip.java

EXPECTED VERSUS ACTUAL BEHAVIOR :
When the charset is Shift_JIS, a byte with value 0x5C should be converted to a character with unicode value 0xA5.
The byte value 0x5C is converted to Unicode value 0x5c.

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
public class RoundTrip{
    public static void main (String args[]){
        String csName = "Shift_JIS";
        String testString = "\u3072\u00a5";	//HIRAGANA LETTER HI, YEN SIGN
        roundTrip(csName, testString);
    }
    
    //do a round-trip conversion from String to byte[] and back to String
    private static void roundTrip(String csName, String testString){
    	try{
    		/*	depending on your configuration, the unicode
                        characters may not
    			display correctly.  This is not relevant to the issue
                        at hand
    			though.	*/
    		
    		// display the arguments passed in
        	System.out.println("encode and decode '" + testString + "' using " + csName);
        	System.out.print("Unicode values: ");
        	int len = testString.length();
        	
        	//display the numeric value of each character before encoding
        	for (int n = 0; n < len; n++){
            	int val = testString.charAt(n);
            	System.out.print(val);
            	System.out.print(" ");
        	}
        	System.out.println();
        	System.out.println();
	        
        	//encode to bytes using the specified charsetName
        	byte[] b = testString.getBytes(csName);
        	System.out.print("Encoded bytes: ");
        	
        	//display the encoded values
        	for (int n = 0; n < b.length; n++){
        		System.out.print(b[n]);
        		System.out.print(" ");
        	}
        	System.out.println();
        	System.out.println();
	        
        	//convert the bytes back to a String
        	String decode = new String(b, csName);
        	System.out.println("Decoded String " + decode);
        	System.out.print("Unicode values: ");
        	len = decode.length();
        	
        	//display the numeric value of each character again
        	for (int n = 0; n < len; n++){
            	int val = decode.charAt(n);
            	System.out.print(val);
            	System.out.print(" ");
        	}
        	System.out.println();
        }
        catch (Throwable t){
        	System.out.println(t);
        }
        	
    }
}
---------- END SOURCE ----------
(Review ID: 183058) 
======================================================================

Comments
EVALUATION Java treatment of 0x5c (JIS X 0201) within Shift_JIS, EUC-JP is intentional and not a bug. The same applies to 0x7e. It is already covered within previous bugIDs such as 4361845. While it accepted that there is a round-trip issue with the provided implementation to change the current behaviour would break many Java applications running in Ja locales when accessing file paths, etc. Note that there is another bug 4486307 which requests documentation clarification (possibly within the JLS spec) to clarify the specifics and rationale of this and some other deviations from Japanese published standards (as opposed to well established industry practices). This bug is still open and it should be addressed ahead of the next J2SE feature release. ###@###.### 2003-03-27
27-03-2003