Bug ID: JDK-6407730 UnicodeLittle is BIG-endian

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

Other	JDK 6
5.0u33Fixed	6 b81Fixed

FULL PRODUCT VERSION :
java version "1.6.0-beta2"
Java(TM) SE Runtime Environment (build 1.6.0-beta2-b78)
Java HotSpot(TM) Client VM (build 1.6.0-beta2-b78, mixed mode, sharing)

ADDITIONAL OS VERSION INFORMATION :
Windows 2000 SP 4

A DESCRIPTION OF THE PROBLEM :
I found that JExcel API sometimes returns wrong contents and I seem to have tracked it down to the following test case:

		byte[] data = new byte[] {65, 0, 114, 0, 105, 0, 97, 0, 108, 0};
		String s = new String(data,"UnicodeLittle");
		System.out.println(s);

This prints out "Arial" with JDK1.5 and some garbage with JDK1.6.


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
public class UnicodeLitteTest
{
	public static void main(String[] args) throws Exception
	{
		byte[] data = new byte[] {65, 0, 114, 0, 105, 0, 97, 0, 108, 0};
		String s = new String(data,"UnicodeLittle");
		System.out.println(s);
	}
}
---------- END SOURCE ----------

Release Regression From : 5.0u6
The above release value was the last known release where this 
bug was known to work. Since then there has been a regression.

EVALUATION sun.io UnicodeLittle has the semantics of "forcing BOM in encoding and accepting a LITTLE endian BOM or no BOM (treat the stream as LITTLE endian) in decoding". The newly add UTF_16LE_BOM forces BOM in both decoding and encoding. There are two possible solution (1)to give UTF_16LE_BOM has the same semantics as the sun.io UnicodeLittle (2)have a new "UnicodeLittle" charset implementation. I would consider (1)is more consistent with the previous releases.

03-04-2006

EVALUATION Running my standard test program shows that this was introduced in mustang b27. (mb29450@suttles) ~/src/toy $ jver 6-b27 jr -source 1.5 Decode UnicodeLittle A 0x0 ==> javac -source 1.5 -Xlint:all Decode.java ==> java -esa -ea Decode UnicodeLittle A 0x0 \u4100 (mb29450@suttles) ~/src/toy $ jver 6-b26 jr -source 1.5 Decode UnicodeLittle A 0x0 ==> javac -source 1.5 -Xlint:all Decode.java ==> java -esa -ea Decode UnicodeLittle A 0x0 A Here are the char_encoding bugs integrated into that build. 5005426: Buffered stream data is discarded by IllegalStateException in 1.4.2 and Tiger 6230124: Incorrect entries in Charset.contains() in different UTF-xyz charsets 6230129: Need a UTF_16LE_Marked charset 6230719: UTF-8 maxCharsPerByte should be 1.0, not 2.0 6233303: nio ISO2022JP.Encoder is broken if unmappableCharacterAction is CodingErrorAction.REPLACE 6233550: Port sun.io regtest to nio.charset 6230129 is the most likely suspect.

03-04-2006

Relates :	JDK-6230129 - Need a UTF_16LE_Marked charset
Relates :	JDK-6448787 - Regression: UnicodeLittle broken on 1.4.2_12