Bug ID: JDK-8025896 (cs) Better handling for invalid byte sequences in doublebyte decoders

Type: Bug
Component: core-libs
Sub-Component: java.nio.charsets
Affected Version: 8

Priority: P3
Status: Resolved
Resolution: Duplicate

Submitted: 2013-10-03
Updated: 2013-10-16
Resolved: 2013-10-16

A recent escalation highlighted issues in the way the JDK handles invalid character values whilst decoding doublebyte character sequences.

JDK 6 and later code doesn't appear to follow the recommended Unicode suggestion around how to deal with invalid byte ranges in double byte decoding. A unicode document talks about the recommended approach for dealing with invalid byte sequences (not exactly a spec, but definitely relevant information):  http://unicode.org/review/pr-121.html

Dev have proposed changes around how the JDK classes could better handle malformed doublebyte characters. The case of a malformed character (legal leading byte) followed by a valid single byte now will return a replacement character for the first malformed byte and a correctly decoded single byte character.