JDK-8259791 : Align some one-way conversion in MS950 charset with Windows
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Priority: P3
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 11-pool
  • Submitted: 2021-01-14
  • Updated: 2021-01-19
  • Resolved: 2021-01-19
Related Reports
CSR :  
Relates :  
Description
Summary
-------

MS950 charset encoder behaves differently as defined in the Traditional Chinese Windows specification

Problem
-------

Windows code page 950 has some n:1 byte-to-char mappings for certain code points. In JDK's MS950 charset, there are 4 char-to-byte mappings differ from Traditional Chinese Windows.<br>
(Actual issue was in https://bugs.openjdk.java.net/browse/JDK-8232161)

Solution
--------

I recommend that following 4 char-to-byte mappings need to change.<br>
<br>
Before:
<pre>
\u2550 -> \xA2\xA4
\u255E -> \xA2\xA5
\u2561 -> \xA2\xA7
\u256A -> \xA2\xA7
</pre>
After:
<pre>
\u2550 -> \xF9\xF9
\u255E -> \xF9\xE9
\u2561 -> \xF9\xEB
\u256A -> \xF9\xEA
</pre>
<br>
Definition:<br>
Traditional Chinese Windows conversion table is:<br>
https://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT<br>
Newer MS950 definition is:<br>
https://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit950.txt<br>
<br>
\u2550, \u255E, \u2561 and \u256A are in BOX DRAWINGS Unicode range.<br>
(See attached 4Chras.png for font glyphs)<br>

Specification
-------------
N/A
Comments
It would be acceptable in some cases for Oracle JDK 11u and OpenJDK 11u to share a CSR, but as this CSR already exists, I'm voting to Approve it as a simpler way to track the accounting.
19-01-2021

Hi [~itakiguchi], This CSR is for Oracle jdk11u.
18-01-2021

I think CSR was already there against jdk11u... See JDK-8248305
15-01-2021

Adding additional information found in the mainline CSR : > Expected result: Java's working behavior should be the same as Windows' one. Exact change: I'd like to change one-way trip conversion definitions by changing make/data/charsetmapping/MS950.nr. No logic change is included. Working behavior: Customer's case is as follows: He uses Traditional Chinese Windows with Version Control System (VCS). Windows implementation: He opens the file which has \xF9\xF9 via Windows application, like Notepad. He saved the file without any change. \xF9\xF9 is stored as \xF9\xF9 ==> VCS does not detect the change. He opens the file which has \xA2\xA4 via Windows application He saved the file without any change. \xA2\xA4 is stored as \xF9\xF9 ==> VCS can detect the change. Current implementation: He opened the file which has \xF9\xF9 via Java application without any change He saved the file without any change \xF9\xF9 is stored as \xA2\xA4, then VCS can detect the changes ==> VCS can detect the change. He opens the file which has \xA2\xA4 via Windows application He saved the file without any change. \xA2\xA4 is stored as \xA2\xA4 ==> VCS does not detect the change. New implementation: He opened the file which has \xF9\xF9 via Java application without any change He saved the file without any change \xF9\xF9 is stored as \xF9\xF9, then VCS can detect the changes. ==> VCS does not detect the change. He opened the file which has \xA2\xA4 via Java application without any change He saved the file without any change \xA2\xA4 is stored as \xF9\xF9, then VCS can detect the changes. ==> VCS can detect the change. If the change is applied, Java's working behavior is the same as Windows' one. >
14-01-2021