At Google we noticed a change in behavior for the HKSCS charset family between jdk7 and jdk8. It looks like a bug in the jdk8 implementations.
Recipe:
-----
public class HKSCS {
public static void main(String[] args) throws Throwable {
StringBuilder sb = new StringBuilder();
sb.append((char) (Character.MIN_HIGH_SURROGATE + 67));
sb.append((char) (Character.MIN_LOW_SURROGATE + 67));
sb.append('a');
byte[] xs = sb.toString().getBytes("Big5-HKSCS");
for (byte x : xs) {
System.out.printf("%02x ", x & 0xff);
}
System.out.println();
}
}
---
Produces:
8a a6 61
in jdk7 and
8a a6
in jdk8, which seems wrong.