United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6407730 UnicodeLittle is BIG-endian
JDK-6407730 : UnicodeLittle is BIG-endian

Details
Type:
Bug
Submit Date:
2006-04-03
Status:
Resolved
Updated Date:
2011-09-06
Project Name:
JDK
Resolved Date:
2006-04-14
Component:
core-libs
OS:
windows_2000
Sub-Component:
java.nio.charsets
CPU:
x86
Priority:
P2
Resolution:
Fixed
Affected Versions:
6
Fixed Versions:

Related Reports
Backport:
Relates:
Relates:

Sub Tasks

Description
FULL PRODUCT VERSION :
java version "1.6.0-beta2"
Java(TM) SE Runtime Environment (build 1.6.0-beta2-b78)
Java HotSpot(TM) Client VM (build 1.6.0-beta2-b78, mixed mode, sharing)

ADDITIONAL OS VERSION INFORMATION :
Windows 2000 SP 4

A DESCRIPTION OF THE PROBLEM :
I found that JExcel API sometimes returns wrong contents and I seem to have tracked it down to the following test case:

		byte[] data = new byte[] {65, 0, 114, 0, 105, 0, 97, 0, 108, 0};
		String s = new String(data,"UnicodeLittle");
		System.out.println(s);

This prints out "Arial" with JDK1.5 and some garbage with JDK1.6.


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
public class UnicodeLitteTest
{
	public static void main(String[] args) throws Exception
	{
		byte[] data = new byte[] {65, 0, 114, 0, 105, 0, 97, 0, 108, 0};
		String s = new String(data,"UnicodeLittle");
		System.out.println(s);
	}
}
---------- END SOURCE ----------

Release Regression From : 5.0u6
The above release value was the last known release where this 
bug was known to work. Since then there has been a regression.

                                    

Comments
EVALUATION

Running my standard test program shows that this was introduced
in mustang b27.

(mb29450@suttles) ~/src/toy $ jver 6-b27 jr -source 1.5 Decode UnicodeLittle A 0x0
==> javac -source 1.5 -Xlint:all Decode.java
==> java -esa -ea Decode UnicodeLittle A 0x0
\u4100
(mb29450@suttles) ~/src/toy $ jver 6-b26 jr -source 1.5 Decode UnicodeLittle A 0x0
==> javac -source 1.5 -Xlint:all Decode.java
==> java -esa -ea Decode UnicodeLittle A 0x0
A

Here are the char_encoding bugs integrated into that build.

5005426: Buffered stream data is discarded by IllegalStateException in 1.4.2 and Tiger
6230124: Incorrect entries in Charset.contains() in different UTF-xyz charsets
6230129: Need a UTF_16LE_Marked charset
6230719: UTF-8 maxCharsPerByte should be 1.0, not 2.0
6233303: nio ISO2022JP.Encoder is broken if unmappableCharacterAction is CodingErrorAction.REPLACE
6233550: Port sun.io regtest to nio.charset

6230129 is the most likely suspect.
                                     
2006-04-03
EVALUATION

sun.io UnicodeLittle has the semantics of "forcing BOM in encoding and accepting a
LITTLE endian BOM or no BOM (treat the stream as LITTLE endian) in decoding". The
newly add UTF_16LE_BOM forces BOM in both decoding and encoding. There are two
possible solution (1)to give UTF_16LE_BOM has the same semantics as the sun.io
UnicodeLittle (2)have a new "UnicodeLittle" charset implementation. 
I would consider (1)is more consistent with the previous releases.
                                     
2006-04-03



Hardware and Software, Engineered to Work Together