Duplicate :
|
Name: gm110360 Date: 07/08/2002 FULL PRODUCT VERSION : java version "1.4.0_01" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0_01-b03) Java HotSpot(TM) Client VM (build 1.4.0_01-b03, mixed mode) FULL OPERATING SYSTEM VERSION : Windows 2000 ADDITIONAL OPERATING SYSTEMS : Linux, Solaris A DESCRIPTION OF THE PROBLEM : The JDK lacks Unicode 3.1 support. Unicode 3.1 needs to be supported, it adds various code points outside of the Unicode BMP (Basic Multilingual Plane). Java, so far, has gotten away with assuming all characters will have 16-bit representations and there will be no codings assigned outside of Plane 0. That has been theoretically false for years but in practice, it's been true - up until now. This will now simply not work with Unicode 3.1 and it will be necessary to add methods to query for surrogates and get appropriate values (maybe as int) for higher planes. Java can still use UTF-16 of course but that does mean that sometimes 2 java characters will be needed to encode *one* unicode code point. The java.lang.Character class needs to be updated and so do various stream and buffer classes (which currently simply ignore and/or discard surrogate codings). Unicode 3.1 is here *today*. Java has to add support - and the sooner the better. REPRODUCIBILITY : This bug can be reproduced always. (Review ID: 158701) ======================================================================