JDK-8215210 : [macos] Hangul text does not shape to the precomposed form on JDK8u
  • Type: Bug
  • Component: client-libs
  • Sub-Component: 2d
  • Affected Version: 8
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: os_x
  • Submitted: 2018-12-11
  • Updated: 2020-02-18
  • Resolved: 2019-02-14
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8 Other
8u221Fixed openjdk8u242Fixed
Description
On the MacOS JRE, when the  font is set to Dialog (or Lucida Grande) : 
Graphics2D.drawString() does not shape the decomposed Korean characters into 
a precomposed form. Whereas on Windows those are rendered into a precomposed 
form. 

For instance, \u1100\u1161 should be shaped into \uAC00 as like Windows JRE 
does. 
            g2.drawString("\u1100\u1161\u1102\u1161", 10, 20); 
            g2.drawString("\uAC00\uB098", 10, 50); 


Please note:
- On Mac Dialog is mapped to "Lucida Grande"
- "Lucida Grande" does not have CJK glyphs. Korean characters  fallback
to some other fonts. It looks like \uAC00 is mapped to "Apple SD Gothic Neo"
and \u1100\u1161 is mapped to "Apple Gothic"
- "Apple SD Gothic Neo" itself has both \uAC00 and \u1100\u1161. e.g. we
wonder why \u1100\u1161 does not fallback to Apple SD Gothic Neo.

This problem was found when testing on MacOS High Sierra which has changes with
the filename normalization.  e.g. when uploading a filename from mac OS X
high Sierra with Firefox, the decomposed strings are used and seeing those
problems.

More discussion is available here.
https://asciiwwdc.com/2017/sessions/715
https://developer.apple.com/library/archive/documentation/FileManagement/Conceptual/APFS_Guide/FAQ/FAQ.html
Comments
8u patch approval at https://mail.openjdk.java.net/pipermail/jdk8u-dev/2019-December/010700.html.
04-12-2019

Fix Request (8u) I'm asking for approval to make that single-digit change described by Phil on 2018-12-22 03:53 ; a request for review posted as https://mail.openjdk.java.net/pipermail/jdk8u-dev/2019-October/010418.html ; also added a couple of simple regression tests.
07-10-2019

I have found "a" problem. It may not be "the" problem, or all of the problems, but in MorphTables[2].cpp the ICU layout code is flagging an error on subtables which are not a multiple of 4 in length. This is correct for the chains (earlier) in the file but not for this case. So it causes layout to error out and discard even the previous correctly handled runs. The fix seems to be to change 0x03 -> 0x01 in the mask used for testing alignment. However we also have some problem like https://bugs.openjdk.java.net/browse/JDK-8201801 Not failing out (the above proposed fix) makes that issue a problem .. at least that is what I suspect, I have not yet been able to test on a JDK that has a fix for that bug. However that fix is a sad workaround at best ..but the issue only affects RTL languages + AAT fonts. We should also take a look at how we can avoid AAT fonts for RTL languages in logical fonts but the long term answer is probably harfbuzz, not ICU. One other small thing is that the cascade list defined in CFont.java was expecting that the CFont it is based on is named for the physical font, but it has a name like "DIalog". Although I have not seen any harm from this, we should proabably change the slot[0] to look up a font based on "nativeFontName". That last one is also true in 11 . but as in 8 it does not seem to be harmful SFAICS.
22-12-2018

I've looked at what is on screen and I don't think the information provided by the submitter that these are coming from different fonts is correct. Something else is affecting the shaping and it is only an issue on JDK 8. It is fine on JDK 11. Actually it is fine on JDK 9 and later but in JDK 9 I could reproduce by specifying to use the JDK 8 vintage ICU layout engine. I can't yet say if it is an ICU bug or something we are doing to provoke it. In fact what is really interesting is that "\u1100\u1161" shapes .. but "\u1100\u1161 " does not. However since we break the text in to runs by script + font, these should be shaped independently, so that is very strange .. Also it shapes correctly if you directly use "Apple SD Gothic Neo", so even though I can't say why without debugging, somehow the two runs must be a factor. And debugging is a problem since I can't build JDK 8 on 10.12.6 And a caveat (one of several), I am presuming that 10.13.6 is really the same.
21-12-2018

There's no list coded in the JDK that has the fallbacks hard coded, we call into an API provided by Apple. It seems likely from the description that Apple Gothic precedes Apple SD Gothic Neo in that ordered list. If that API unfortunately picks some of the code points from one font and some from another, it creates a boundary across which shaping cannot work.
11-12-2018