United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-4508058 UTF-8 encoding does not recognize initial BOM
JDK-4508058 : UTF-8 encoding does not recognize initial BOM

Details
Type:
Enhancement
Submit Date:
2001-09-27
Status:
Closed
Updated Date:
2006-02-18
Project Name:
JDK
Resolved Date:
2006-02-18
Component:
core-libs
OS:
windows_nt,generic
Sub-Component:
java.nio.charsets
CPU:
other,generic
Priority:
P3
Resolution:
Won't Fix
Affected Versions:
1.4.0,1.4.2_05
Fixed Versions:

Related Reports
Relates:
Relates:

Sub Tasks

Description
A Utf-8 stream can optionally beign with a byte order mark (see, for example http://www.unicode.org.unicode/faq/utf_bom.html).  This is the character FEFF, which is represented as EF BB BF in utf-8. Java's utf-8 encoding does not recognize this character as a BOM, though; the result of reading such a stream is a set of characters bginning with FEFF.

                                    

Comments
PUBLIC COMMENTS

Java does not recognize the optional BOM which can begin a UTF-8 stream.  It treats the BOM as if it were the initial character of the stream.
                                     
2004-09-29
SUGGESTED FIX

Recognize the BOM, just as UTF-16 does.
                                     
2004-09-29
WORK AROUND

Application code must recognize and skip the BOM itself.
                                     
2004-09-29
EVALUATION

for mustang
                                     
2005-09-27
EVALUATION

The assumption we made to implement this RFE is that the change would not
break existing real world application, this assumption is obvious not true,
see #6378911. We decided to back out the change we've made and closed this
RFE as "will not fix", for compatibility reason.
                                     
2006-02-18



Hardware and Software, Engineered to Work Together