SYNOPSIS -------- Incorrect UTF8 conversion for sequence ED 31 OPERATING SYSTEM ---------------- All FULL JDK VERSION ---------------- Java 6 (tested with 1.6.0_26) Java 7 (tested with GA / b147) PROBLEM DESCRIPTION from LICENSEE --------------------------------- The byte sequence ED 31 is not parsed correctly The UTF8 specification states that the maximal valid subpart should be replaced by a single fffd before moving to process the next one. In this case ED is valid three byte sequence, but the second byte (31) is invalid. Therefore ED should be replaced by fffd, and 31 should be processed as single byte. 31 is valid single byte (1f). TESTCASE -------- public class RegTest { public static void main (String args[]) throws Exception { byte[] test1 = new byte[] {(byte)0xED, 31}; String s1 = stringToHex(new String(test1, "UTF8")); System.out.println(s1); } public static String stringToHex( String base ) { StringBuffer buffer = new StringBuffer(); int intValue; for (int x = 0; x < base.length(); x ++) { intValue = base.charAt(x); String hex = Integer.toHexString(intValue); if (hex.length() == 1) { buffer.append("0" + hex + " "); } else { buffer.append(hex + " "); } } return buffer.toString(); } } REPRODUCTION INSTRUCTIONS ------------------------- 1. javac RegTest.java 2. java RegTest Actual Output: fffd Expected Output: fffd 1f
|