At Google we noticed a change in behavior for the HKSCS charset family between jdk7 and jdk8. It looks like a bug in the jdk8 implementations. Recipe: ----- public class HKSCS { public static void main(String[] args) throws Throwable { StringBuilder sb = new StringBuilder(); sb.append((char) (Character.MIN_HIGH_SURROGATE + 67)); sb.append((char) (Character.MIN_LOW_SURROGATE + 67)); sb.append('a'); byte[] xs = sb.toString().getBytes("Big5-HKSCS"); for (byte x : xs) { System.out.printf("%02x ", x & 0xff); } System.out.println(); } } --- Produces: 8a a6 61 in jdk7 and 8a a6 in jdk8, which seems wrong.