Bug ID: JDK-8356803 Test TextLayout/TestControls fails on windows & linux: line and paragraph separator show non-zero advance

Other
tbdUnresolved

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/25560 Date: 2025-05-30 23:40:22 +0000
30-05-2025
OK, here's my proposal on how we can think about this topic: First, the Unicode guidance which we've been discussing [1] is specific to "unsupported characters", that is, "if the rendering system doesn’t fully support them". So the first question that we have to ask for these whitespace characters [2] is whether we consider them to be "fully supported". If so, the guidance does not apply. So which of the whitespace characters do we consider to be fully supported? For a character like U+00A0 (no break space), I think it makes sense to let the font decide what to show, and that's as far as we need to go for it to be "fully supported". It's a special kind of space, but it is just a space, at the end of the day. I feel the same about all the other types of spaces. The questionable characters in my mind are the characters which have text positioning side effects: tabs and line breaks. My understanding is that, in our stack, they should have been used before the `drawString` call to adjust the position of the `drawString` call (see e.g. the tab handling advice in the LineBreakMeasurer JavaDoc). So by the time we get to `drawString` and friends, if these characters (tabs and line breaks) are still in the string, we should treat them as "unsupported characters". From this perspective, it makes sense that we are currently treating /t, /n and /r differently -- they aren't really supposed to be used at the `drawString` stage of things. There are just two issues: 1. The special treatment is different than the Unicode recommendation (we omit them, Unicode recommends displaying a blank space). 2. It's an incomplete list. There are other (rarer) characters that should technically be treated the same way: U+000B (Vertical Tab), U+000C (Form Feed), U+0085 (Next Line), U+2028 (Line Separator), U+2029 (Paragraph Separator) The different treatment doesn't bother me so much at this stage. The /t, /n, /r handling has been done this way for many years now, and nobody has complained. Further, changing it to be Unicode-compliant carries some risk. I propose to leave the special handling (omit instead of showing blank space) as is, at least for now. What we do need to address, IMO, is that the list of unsupported whitespace characters is incomplete. It is missing VT, FF, NEL, LS, and PS (LS and PS being the ones that were detected by this test). Let me know if this framing makes sense, or if you have a different perspective! [1] https://www.unicode.org/faq/unsup_char.html#2 [2] https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt 0009..000D ; White_Space # Cc [5] <control-0009>..<control-000D> 0020 ; White_Space # Zs SPACE 0085 ; White_Space # Cc <control-0085> 00A0 ; White_Space # Zs NO-BREAK SPACE 1680 ; White_Space # Zs OGHAM SPACE MARK 2000..200A ; White_Space # Zs [11] EN QUAD..HAIR SPACE 2028 ; White_Space # Zl LINE SEPARATOR 2029 ; White_Space # Zp PARAGRAPH SEPARATOR 202F ; White_Space # Zs NARROW NO-BREAK SPACE 205F ; White_Space # Zs MEDIUM MATHEMATICAL SPACE 3000 ; White_Space # Zs IDEOGRAPHIC SPACE
27-05-2025
Additional interesting discussion between the HarfBuzz, Chrome and Firefox teams: https://github.com/harfbuzz/harfbuzz/issues/4279
14-05-2025
> That being said I'm not sure which behavior JDK should follow - old or new. This is something that [~prr] or [~aivanov] would be able to answer better. Yeah, it's not clear to me either, I can see arguments both ways. > On Linux 0x2028, 0x2029 is rendered as space which seems OK Keep in mind this may be a result of the default Linux fonts, not necessarily anything Linux-specific in the OpenJDK code (i.e. it may be some version of "dumb luck"). > Based on the above links, it looks reasonable to follow Unicode convention. Possibly, but the immediate next question would be "are we going full-Unicode for this?", i.e. treating all White_Space chars (0009..000D, 0020, 0085, 00A0, 1680, 2000..200A, 2028, 2029, 202F, 205F, 3000) as a visible blank space. If so, it will be a larger change that may break other longstanding assumptions; and if not, where are we drawing the arbitrary line?
14-05-2025
[~prr] >Not exactly gibberish. > Some font went to the trouble of providing a specific visual representation of the character > designed to make it easy for you to know what code point is being requested. Agreed. Gibberish is incorrect in this context :) Visual L Sep & P Sep characters Follow-up question: Does it need to be replaced with space in windows specific code as done here ?https://chromium.googlesource.com/chromium/src/third_party/+/master/blink/renderer/platform/fonts/shaping/harfbuzz_face.cc#119
13-05-2025
Not exactly gibberish. Some font went to the trouble of providing a specific visual representation of the character designed to make it easy for you to know what code point is being requested.
13-05-2025
[~dgredler] > - Per Unicode, these two chars are supposed to be rendered as a space character: https://www.unicode.org/faq/unsup_char.html#2 > - Per Unicode, they are not default-ignorable (the chars which get the "invisible glyph" treatment): https://www.unicode.org/faq/unsup_char.html#3 Based on the above links, it looks reasonable to follow Unicode convention. That being said I'm not sure which behavior JDK should follow - old or new. This is something that [~prr] or [~aivanov] would be able to answer better. One thing to be noted: On Linux 0x2028, 0x2029 is rendered as space which seems OK according to the new Unicode convention. But on windows these unicode characters are rendered as a gibberish character on TextLayout which probably needs to be fixed. See the attached screenshot for 0x2028, 0x2029 and compare it with 0x0009 (last line). We probably need to add a similar fix on windows specific code as done here - (if we decide to follow the new convention) https://chromium.googlesource.com/chromium/src/third_party/+/master/blink/renderer/platform/fonts/shaping/harfbuzz_face.cc#119
13-05-2025
How sure are we that we want the old behavior? A few data points: - Per Unicode, these two chars are supposed to be rendered as a space character: https://www.unicode.org/faq/unsup_char.html#2 - Per Unicode, they are not default-ignorable (the chars which get the "invisible glyph" treatment): https://www.unicode.org/faq/unsup_char.html#3 - When used in HTML, Chrome, IE and Edge follow the Unicode convention (and show a space char): https://chromium.googlesource.com/chromium/src/third_party/+/master/blink/renderer/platform/fonts/shaping/harfbuzz_face.cc#119 - Firefox, however, makes these characters invisible (no advance, so basically the "invisible glyph" treatment) - The OpenJDK behavior had been to treat these as invisible glyphs for at least 18 years: https://github.com/openjdk/jdk/blame/89e068bc19b12bb8f4a175fdf979cbe795ac3709/src/java.desktop/share/classes/sun/font/CMap.java#L1083 - I'm pretty sure the old behavior was only partial since the implementation was in CMap (used only by the TrueTypeGlyphMapper), i.e. different on-screen results for other glyph mappers like we have on macOS, and different results when printing If anyone has access to the pre-2007 repository history, it would be interesting to know if there is more context in the commit history as to why they were being treated this way.
12-05-2025
> Is it a regression in JDK or new test bug? Regression caused by JDK-8208377 - Soft hyphens render if not using TextLayout Regression seen on jdk25+b10 onwards
12-05-2025
> Is it a regression in JDK or new test bug? Looks like a regression to me. I have a very-very old build of JDK 25 and the problem doesn't reproduce on it.
12-05-2025
Is it a regression in JDK or new test bug?
12-05-2025
I can't reproduce the problem on January releases (11.0.26, 17.0.14, 21.0.6) or on April releases (11.0.27, 17.0.15, 21.0.7, 24.0.1). The GA of 24 isn't affected either.
12-05-2025
> Why does Affects Version include 8, 11, 17, 21? I cloned the previous issue and might have missed changing affected version and testing on older versions. I have updated it now.
12-05-2025
Why does Affects Version include 8, 11, 17, 21? I cannot reproduce the failure on 21.0.6 or 21.0.7, the latter is the latest GA version. I haven't tried other families and the most recent builds. I ran this test on 11u, 17u, 21u each time HarfBuzz library was updated, and all the characters had zero width on Windows and Linux. This looks like a regression to me.
12-05-2025

Causes :	JDK-8208377 - Soft hyphens render if not using TextLayout
Relates :	JDK-8356812 - Create an automated version of TextLayout/TestControls
Relates :	JDK-8353187 - Test TextLayout/TestControls fails on macOS: width of 0x9, 0xa, 0xd isn't zero
Relates :	JDK-8208377 - Soft hyphens render if not using TextLayout