JDK-4508058 : UTF-8 encoding does not recognize initial BOM
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Affected Version: 1.4.0,1.4.2_05
  • Priority: P3
  • Status: Closed
  • Resolution: Won't Fix
  • OS: generic,windows_nt
  • CPU: generic,other
  • Submitted: 2001-09-27
  • Updated: 2017-05-16
  • Resolved: 2006-02-18
Related Reports
Relates :  
Relates :  
A Utf-8 stream can optionally beign with a byte order mark (see, for example http://www.unicode.org.unicode/faq/utf_bom.html).  This is the character FEFF, which is represented as EF BB BF in utf-8. Java's utf-8 encoding does not recognize this character as a BOM, though; the result of reading such a stream is a set of characters bginning with FEFF.

EVALUATION The assumption we made to implement this RFE is that the change would not break existing real world application, this assumption is obvious not true, see #6378911. We decided to back out the change we've made and closed this RFE as "will not fix", for compatibility reason.

EVALUATION for mustang

WORK AROUND Application code must recognize and skip the BOM itself.

SUGGESTED FIX Recognize the BOM, just as UTF-16 does.

PUBLIC COMMENTS Java does not recognize the optional BOM which can begin a UTF-8 stream. It treats the BOM as if it were the initial character of the stream.