JDK-6407730 : UnicodeLittle is BIG-endian
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Affected Version: 6
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • OS: windows_2000
  • CPU: x86
  • Submitted: 2006-04-03
  • Updated: 2011-09-06
  • Resolved: 2006-04-14
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other JDK 6
5.0u33Fixed 6 b81Fixed
Related Reports
Relates :  
Relates :  
Description
FULL PRODUCT VERSION :
java version "1.6.0-beta2"
Java(TM) SE Runtime Environment (build 1.6.0-beta2-b78)
Java HotSpot(TM) Client VM (build 1.6.0-beta2-b78, mixed mode, sharing)

ADDITIONAL OS VERSION INFORMATION :
Windows 2000 SP 4

A DESCRIPTION OF THE PROBLEM :
I found that JExcel API sometimes returns wrong contents and I seem to have tracked it down to the following test case:

		byte[] data = new byte[] {65, 0, 114, 0, 105, 0, 97, 0, 108, 0};
		String s = new String(data,"UnicodeLittle");
		System.out.println(s);

This prints out "Arial" with JDK1.5 and some garbage with JDK1.6.


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
public class UnicodeLitteTest
{
	public static void main(String[] args) throws Exception
	{
		byte[] data = new byte[] {65, 0, 114, 0, 105, 0, 97, 0, 108, 0};
		String s = new String(data,"UnicodeLittle");
		System.out.println(s);
	}
}
---------- END SOURCE ----------

Release Regression From : 5.0u6
The above release value was the last known release where this 
bug was known to work. Since then there has been a regression.

Comments
EVALUATION sun.io UnicodeLittle has the semantics of "forcing BOM in encoding and accepting a LITTLE endian BOM or no BOM (treat the stream as LITTLE endian) in decoding". The newly add UTF_16LE_BOM forces BOM in both decoding and encoding. There are two possible solution (1)to give UTF_16LE_BOM has the same semantics as the sun.io UnicodeLittle (2)have a new "UnicodeLittle" charset implementation. I would consider (1)is more consistent with the previous releases.
03-04-2006

EVALUATION Running my standard test program shows that this was introduced in mustang b27. (mb29450@suttles) ~/src/toy $ jver 6-b27 jr -source 1.5 Decode UnicodeLittle A 0x0 ==> javac -source 1.5 -Xlint:all Decode.java ==> java -esa -ea Decode UnicodeLittle A 0x0 \u4100 (mb29450@suttles) ~/src/toy $ jver 6-b26 jr -source 1.5 Decode UnicodeLittle A 0x0 ==> javac -source 1.5 -Xlint:all Decode.java ==> java -esa -ea Decode UnicodeLittle A 0x0 A Here are the char_encoding bugs integrated into that build. 5005426: Buffered stream data is discarded by IllegalStateException in 1.4.2 and Tiger 6230124: Incorrect entries in Charset.contains() in different UTF-xyz charsets 6230129: Need a UTF_16LE_Marked charset 6230719: UTF-8 maxCharsPerByte should be 1.0, not 2.0 6233303: nio ISO2022JP.Encoder is broken if unmappableCharacterAction is CodingErrorAction.REPLACE 6233550: Port sun.io regtest to nio.charset 6230129 is the most likely suspect.
03-04-2006