JDK-8080248 : Coding regression in HKSCS charsets
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2015-05-13
  • Updated: 2015-09-29
  • Resolved: 2015-05-21
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8 JDK 9
8u60Fixed 9 b66Fixed
Description
At Google we noticed a change in behavior for the HKSCS charset family between jdk7 and jdk8.  It looks like a bug in the jdk8 implementations.

Recipe:
-----
public class HKSCS {
    public static void main(String[] args) throws Throwable {
        StringBuilder sb = new StringBuilder();
        sb.append((char) (Character.MIN_HIGH_SURROGATE + 67));
        sb.append((char) (Character.MIN_LOW_SURROGATE + 67));
        sb.append('a');
        byte[] xs = sb.toString().getBytes("Big5-HKSCS");
        for (byte x : xs) {
            System.out.printf("%02x ", x & 0xff);
        }
        System.out.println();
    }
}
---
Produces:
8a a6 61 
in jdk7 and
8a a6 
in jdk8, which seems wrong.