JDK-7082884 : Incorrect UTF8 conversion for sequence ED 31
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Affected Version: 7
  • Priority: P4
  • Status: Closed
  • Resolution: Won't Fix
  • OS: generic
  • CPU: generic
  • Submitted: 2011-08-24
  • Updated: 2015-01-29
  • Resolved: 2012-11-09
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8
8Fixed
Description
SYNOPSIS
--------
Incorrect UTF8 conversion for sequence ED 31

OPERATING SYSTEM
----------------
All

FULL JDK VERSION
----------------
Java 6 (tested with 1.6.0_26)
Java 7 (tested with GA / b147)

PROBLEM DESCRIPTION from LICENSEE
---------------------------------
The byte sequence ED 31 is not parsed correctly

The UTF8 specification states that the maximal valid subpart should be replaced by a single fffd before moving to process the next one. In this case ED is valid three byte sequence, but the second byte (31) is invalid. Therefore ED should be replaced by fffd, and 31 should be processed as single byte. 31 is valid single byte (1f).

TESTCASE
--------
public class RegTest {
    public static void main (String args[]) throws Exception {
        byte[] test1 = new byte[] {(byte)0xED, 31};
        String s1 = stringToHex(new String(test1, "UTF8"));
        System.out.println(s1);
    }

    public static String stringToHex( String base ) {
        StringBuffer buffer = new StringBuffer();
        int intValue;
        for (int x = 0; x < base.length(); x ++) {
            intValue = base.charAt(x);
            String hex = Integer.toHexString(intValue);
            if (hex.length() == 1) {
                buffer.append("0" + hex + " ");
            } else {
                buffer.append(hex + " ");
            }
        }
        return buffer.toString();
    }
}

REPRODUCTION INSTRUCTIONS
-------------------------
1. javac RegTest.java
2. java RegTest

Actual Output:
fffd

Expected Output:
fffd 1f

Comments
P4 and will probably not get fixed in JDK6.
09-11-2012

EVALUATION The submitter is correct. While the current implementation gives the best performance, the Standard appears to suggest we MUST return malform(1) in case of "mixed" illegal utf8 byte sequence.
24-08-2011