Duplicate :
|
A recent escalation highlighted issues in the way the JDK handles invalid character values whilst decoding doublebyte character sequences. JDK 6 and later code doesn't appear to follow the recommended Unicode suggestion around how to deal with invalid byte ranges in double byte decoding. A unicode document talks about the recommended approach for dealing with invalid byte sequences (not exactly a spec, but definitely relevant information): http://unicode.org/review/pr-121.html Dev have proposed changes around how the JDK classes could better handle malformed doublebyte characters. The case of a malformed character (legal leading byte) followed by a valid single byte now will return a replacement character for the first malformed byte and a correctly decoded single byte character.