Bug ID: JDK-4915107 Clarify supplementary character handling in modified UTF-8

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

Other
5.0 b54Fixed


Name: nl37777			Date: 08/29/2003

The Java VM and the various interfaces attached to it 
(such as the Java Native Interface) have always used a modified form of 
the standard UTF-8 encoding. The same encoding has been used in the 
java.io.DataInput and DataOutput classes, but there has been documented 
for a long time as "Java modified UTF-8". Since Java modified UTF-8 and 
standard UTF-8 are incompatible, it is necessary to clarify throughout 
the Java platform specifications which interfaces use which encoding. 
Also, the description in the Java Virtual Machine Specification and 
some other documentation make it sound as if Java modified UTF-8 could 
not encode supplementary characters. In fact, it appears that all parts 
of the J2SDK that deal with Java modified UTF-8 handle supplementary 
characters just fine - they simply represent the surrogate pair of the 
character's UTF-16 representation as two three-byte sequences.

This needs to be better documented at least in the following 
specifications:
- Java Virtual Machine Specification
- Java Native Interface Specification
- Object Serialization Specification
- Java Platform Debugger Architecture
- Java Virtual Machine Profiler Interface
- Java Virtual Machine Tool Interface
This is part of Tiger release driver 4533872.
======================================================================

CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: tiger-beta2 tiger-rc FIXED IN: tiger-beta2 tiger-rc INTEGRATED IN: tiger-b54 tiger-rc

08-07-2004

EVALUATION Norbert Lindenberg has completed a review of our documentation and is working on the appropriate changes. ###@###.### 2003-09-08 Name: nl37777 Date: 05/10/2004 The CCC has approved updates for the following specifications: - Java Virtual Machine Specification - Java Native Interface Specification - Java Virtual Machine Tool Interface Specification - Serialization Specification - Class and method descriptions in javadoc form for DataInput, DataOutput, and related classes in java.io, as well as ImageInputStream and ImageOutputStream in javax.imageio.stream. The changes to class and method descriptions in javadoc form will be integrated under this bug id. The changes to the JVMTI specification will be integrated under bug id 5044673. The other specifications will be updated in parallel. ======================================================================

08-07-2004

Duplicate :	JDK-4873956 - RandomAccessFile.writeUTF(...) doesn't say "modified"
Relates :	JDK-4533872 - Unicode supplementary character support (JSR-204)
Relates :	JDK-5049313 - Implement all JVMTI strings as modified UTF-8
Relates :	JDK-5044673 - JVMTI Doc: Clarify supplementary character handling is modified UTF-8