JDK-8025896 : (cs) Better handling for invalid byte sequences in doublebyte decoders
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Affected Version: 8
  • Priority: P3
  • Status: Resolved
  • Resolution: Duplicate
  • Submitted: 2013-10-03
  • Updated: 2013-10-16
  • Resolved: 2013-10-16
Related Reports
Duplicate :  
Description
A recent escalation highlighted issues in the way the JDK handles invalid character values whilst decoding doublebyte character sequences.

JDK 6 and later code doesn't appear to follow the recommended Unicode suggestion around how to deal with invalid byte ranges in double byte decoding. A unicode document talks about the recommended approach for dealing with invalid byte sequences (not exactly a spec, but definitely relevant information):  http://unicode.org/review/pr-121.html

Dev have proposed changes around how the JDK classes could better handle malformed doublebyte characters. The case of a malformed character (legal leading byte) followed by a valid single byte now will return a replacement character for the first malformed byte and a correctly decoded single byte character.