JDK-8248305 : Align some one-way conversion in MS950 charset with Windows
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Priority: P3
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 11-pool
  • Submitted: 2020-06-25
  • Updated: 2021-01-15
  • Resolved: 2020-07-01
Related Reports
CSR :  
CSR :  
Relates :  
Description
This one is same as JDK-8233385.<br>
https://bugs.openjdk.java.net/browse/JDK-8233385

Summary
-------

MS950 charset encoder behaves differently as defined in the Traditional Chinese Windows specification

Problem
-------

Windows code page 950 has some n:1 byte-to-char mappings for certain code points. In JDK's MS950 charset, there are 4 char-to-byte mappings differ from Traditional Chinese Windows.<br>
(Actual issue was in https://bugs.openjdk.java.net/browse/JDK-8232161)

Solution
--------

I recommend that following 4 char-to-byte mappings need to change.<br>
<br>
Before:
<pre>
\u2550 -> \xA2\xA4
\u255E -> \xA2\xA5
\u2561 -> \xA2\xA7
\u256A -> \xA2\xA7
</pre>
After:
<pre>
\u2550 -> \xF9\xF9
\u255E -> \xF9\xE9
\u2561 -> \xF9\xEB
\u256A -> \xF9\xEA
</pre>
<br>
Definition:<br>
Traditional Chinese Windows conversion table is:<br>
https://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT<br>
Newer MS950 definition is:<br>
https://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit950.txt<br>
<br>
\u2550, \u255E, \u2561 and \u256A are in BOX DRAWINGS Unicode range.<br>
(See attached 4Chras.png for font glyphs)<br>

Specification
-------------
N/A

Q&A Comments
---------------

***Joe Darcy added a comment - 2020-02-13 18:07***

Ichiroh Takiguchi, so the difference before and after is which set of box characters get mapped to?

Is there any JDK or Java SE specification that needs to be updated?

***Joe Darcy added a comment - 2020-02-14 09:01***

Marking the request as pended until the questions above are answered.

***Ichiroh Takiguchi added a comment - 2020-02-16 05:54***

Sorry, I'm late.

 >  the difference before and after is which set of box characters get mapped to?

Unicode side codes are not changed. It means font glyphs are not changed.<br>
For example, MS950 - Uncode mapping are not changed<br>
\xA2\xA4 -> \u2550<br>
\xF9\xF9 -> \u2550<br>
Before change:<br>
\u2550 -> \xA2\xA4<br>
After change:<br>
\u2550 -> \xF9\xF9<br>

 >   Is there any JDK or Java SE specification that needs to be updated?

No. This CSR does not affect Java SE specification. It just follows the latest Microsoft's CP950 specification.

***Joe Darcy added a comment - 2020-02-18 10:48***

Ichiroh Takiguchi, please explain what exactly this CSR proposes to alter, the value with of which methods would differ, etc.

***Joe Darcy added a comment - 2020-02-19 12:51***

This request will stay pended until the requested information is provided.

***Ichiroh Takiguchi added a comment - 2020-02-20 02:40***

**Expected result:**

Java's working behavior should be the same as Windows' one.

**Exact change:**

I'd like to change one-way trip conversion definitions by changing make/data/charsetmapping/MS950.nr. No logic change is included. 

**Working behavior:**

Customer's case is as follows:

 - He uses Traditional Chinese Windows with Version Control System (VCS). 

Windows implementation:

 - He opens the file which has \xF9\xF9 via Windows application, like Notepad.
 - He saved the file without any change.
 - \xF9\xF9 is stored as \xF9\xF9

==> VCS does not detect the change.

 - He opens the file which has \xA2\xA4 via Windows application
 - He saved the file without any change.
 -  \xA2\xA4 is stored as \xF9\xF9

==> VCS can detect the change.

Current implementation:

 - He opened the file which has \xF9\xF9 via Java application without any change
 - He saved the file without any change
 - \xF9\xF9 is stored as \xA2\xA4, then VCS can detect the changes

==> VCS can detect the change.

 - He opens the file which has \xA2\xA4 via Windows application
 - He saved the file without any change.
 - \xA2\xA4 is stored as \xA2\xA4

==> VCS does not detect the change.

New implementation:

 - He opened the file which has \xF9\xF9 via Java application without any change
 - He saved the file without any change
 - \xF9\xF9 is stored as \xF9\xF9, then VCS can detect the changes.

==> VCS does not detect the change.

 - He opened the file which has \xA2\xA4 via Java application without any change
 - He saved the file without any change
 - \xA2\xA4 is stored as \xF9\xF9, then VCS can detect the changes.

==> VCS can detect the change.

If the change is applied, Java's working behavior is the same as Windows' one.

***Joe Darcy added a comment - 2020-02-25 10:51***

After some additional information from Naoto Sato, moving to Approved.

Please consider a release not for this change.
Comments
Adding myself as a reviewer for the backport; moving to Approved.
01-07-2020