JDK-8217097 : Correct UnicodeDecoder U+FFFE handling
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Priority: P3
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 13
  • Submitted: 2019-01-15
  • Updated: 2019-01-15
  • Resolved: 2019-01-15
Related Reports
CSR :  
Description
Summary
-------

Correct the behavior of UnicodeDecoder subclasses on handling U+FFFE code point in the middle of the input buffer.

Problem
-------

Currently UnicodeDecoder deals with U+FFFE in the middle of a string as "malformed" as it is a non-character. This has been correct up until Unicode 7. However Unicode 7 includes the corrigendum (http://www.unicode.org/versions/corrigendum9.html) that changed the definition of non-characters. UnicodeDecoder's behavior should be modified to conform to it.

Solution
--------

Remove the piece of code in UnicodeDecoder which detects the code point in the middle and return "malformed" CodeResult, so that the UTF16 decoders (StandardCharsets.UTF_16[LE/BE]) can pass through the code point.

Specification
-------------

As required by the Unicode 7 Corrigendum 9, U+FFFE is passed through as a code point.


Comments
I see a release note is already planned. Moving to Approved.
15-01-2019