JDK-4328816 : Unicode 2.0 surrogate support
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Affected Version: 1.2.0,1.2.2,1.4.0
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic,windows_nt
  • CPU: generic,x86
  • Submitted: 2000-04-07
  • Updated: 2000-12-20
  • Resolved: 2000-12-20
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
1.4.0 betaFixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Description
Provide support for Unicode 2.0 surrogates. 

graham.hamilton@Eng 2000-04-07


Surrogate support for this specific feature is defined as supporting the current specification of UTF-8 conversion for surrogate pairs so that the converted pair requires only 4 octets instead of possibly 6.
john.oconner@Eng 2000-06-22

This mostly comes down to fixing the following bugs:
4251997 - UTF-8 Surrogate Decoding is Broken
4344267 - Broken UTF-8 conversion of split surrogate-pair
norbert.lindenberg@Eng 2000-07-19

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: merlin merlin-beta FIXED IN: merlin-beta INTEGRATED IN: merlin-beta
14-06-2004

EVALUATION Determined to be a character conversion request that UTF-8 conversion use updated algorithms for converting surrogate pairs. The updated conversion algorithm is more efficient in storage requirements since a surrogate pair can be represented as 4 octets instead of possibly 6. john.oconner@Eng 2000-06-22 The reported issues concerning the handling of UTF-8 surrogates will be largely addressed within the planned UTF-8 charset encoder/decoder planned for Merlin as part of the delivery of a pluggable charset SPI/API (detailed in JSR-051). Ian.Little@Ireland 11/20/2000
20-11-2000

PUBLIC COMMENTS Determined to be a character conversion request that UTF-8 conversion use updated algorithms for converting surrogate pairs. The updated conversion algorithm is more efficient in storage requirements since a surrogate pair can be represented as 4 octets instead of possibly 6. john.oconner@Eng 2000-06-22
22-06-2000