JDK-4508058 : UTF-8 encoding does not recognize initial BOM
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Affected Version: 1.4.0,1.4.2_05
  • Priority: P3
  • Status: Closed
  • Resolution: Won't Fix
  • OS: generic,windows_nt
  • CPU: generic,other
  • Submitted: 2001-09-27
  • Updated: 2017-05-16
  • Resolved: 2006-02-18
Related Reports
Relates :  
Relates :  
Description
A Utf-8 stream can optionally beign with a byte order mark (see, for example http://www.unicode.org.unicode/faq/utf_bom.html).  This is the character FEFF, which is represented as EF BB BF in utf-8. Java's utf-8 encoding does not recognize this character as a BOM, though; the result of reading such a stream is a set of characters bginning with FEFF.

Comments
EVALUATION The assumption we made to implement this RFE is that the change would not break existing real world application, this assumption is obvious not true, see #6378911. We decided to back out the change we've made and closed this RFE as "will not fix", for compatibility reason.
18-02-2006

EVALUATION for mustang
27-09-2005

WORK AROUND Application code must recognize and skip the BOM itself.
29-09-2004

SUGGESTED FIX Recognize the BOM, just as UTF-16 does.
29-09-2004

PUBLIC COMMENTS Java does not recognize the optional BOM which can begin a UTF-8 stream. It treats the BOM as if it were the initial character of the stream.
29-09-2004