Bug ID: JDK-4777313 Unicode 3.2 - based line-wraps in Swing

Type: Enhancement
Component: client-libs
Sub-Component: javax.swing
Affected Version: 1.4.1

Priority: P4
Status: Closed
Resolution: Duplicate
OS: windows_xp
CPU: x86

Submitted: 2002-11-12
Updated: 2003-04-23
Resolved: 2003-04-23


Name: gm110360			Date: 11/11/2002


FULL PRODUCT VERSION :
java version "1.4.1"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-b21)
Java HotSpot(TM) Client VM (build 1.4.1-b21, mixed mode)

A DESCRIPTION OF THE PROBLEM :
For internationalization, and also to allow writing more
locale-independant GUI interfaces in Swing, that easily be
translated with a single source code, and a simple set of
resource bundles, we need something to allow correct
handling of two required features:
- Word-wrap
- Line Wrap
- directionality

Unicode 3.2 publishes a set of character properties related
to:
- character width (half-width/full-width)
- reorientability of half-width characters in a vertical
layout, or their conversion from half-width to full width
- Unicode canonicalization rules (related to combining
marks), and management of presentation forms (contextual
forms for characters, styling cobined to some
characters, ...)
- line wrap attributes for characters
- technical reports with sample code snippets to handle
these new character properties

The most common problem with internationalized
applications, after directionality in Hebrew and Arabic, is
the management of linewraps: this directly affect Asian
texts, which don't use any space to allow simple line-wrap
or word-wrap when creating a layout to display the text.

A "simple" solution would be to expect that Asian text will
contain spaces. This is true if the set of resources to
displayed is fixed and managed in  static resources,
however it is not correct according to the standard layout
of these languages. To solve this problem, a program should
be able to detect some characters that can help performing
linewraps correctly:
- full-width punctuation used in Chinese or Japanese are
mostly equivalent to half-width punctuation and a space
- Chinese Hanzi and Japanese Kanji characters are
considered as words alone, that can be wrapped individually
- Japanese Katakatana, Hiragana have some rules to
delimitate syllables or terms that can be wrapped
individually
- Korean Hangul characters are composed in syllables that
can be computed algorithmically (the L,V*,T algorithm):
line-wrap can occur between syllables but not in the middle
of a LVT syllable sequence.
- There's generally no need to support a vertical layout
for Asian languages, as they also accept the horizotal
layout (the biggest layout problem comes from Semitic
languages)
- Latin-, Greek- or Cyrillic-based scripts usually have
short enough words to allow a simple wrap algorithm based
on word-wrap without needing hyphenation (and
dictionnaries) if the GUI is correctly designed with a
sufficient display width, and they use the usual
punctuations and spaces to delimit words
- Generally, a change of script delimits a line-wrap
opportunity (for example between Latin and Higagana, or
between Hiragana and Katakana, or between Hira/kata and
Hanzi/Kanji...)
- Unicode provides anefficient algorithm to handle the
linewrap opportunities based on pairs of character classes
that will work very well with simple scripts

Is it possible to add new classes in the
java.lang.Character family to handle the now standardized
new properties for characters:
- east asian width
- derived normalization
- linewrap opportunities classes
- case folding
- special casing
in a similar way that is now implemented with the
java.lang.UnicodeBlock class ?

Then to proide new APIs for Swing that would use these
properties to allow parsing a string into wrappable tokens
or to proide common transformations of strings to comply
with a text layout manager?

The most important changes will be in the way text is
handled in HTML renderers, and in JTextArea

Designing an interface that complies with these rules
should be the first goal, and there should be simple
implementations that will work on all important scripts
supported now by Java: Latin, Cyrillic, Greek, Hebrew,
Hiragana, Katakana, Hanzi/Kanji, Hangul, Thai

There also should be support now for Vietnamese, which is
not really a complex script (VISCII does not fully comply
with ISO-8859 rules as it uses some ASCII control bytes to
represent a few accented latin characters but it still
works as a common single-byte encoding; alternatives use
combining marks and the most commonly used character set is
windows-1258 using those combining marks and extending an
ISO registered character set with some characters commonly
found on all Windows ANSI character sets).


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Take a typical application in Java using simple
MessagesBundles to internationalize their GUI. Give these
bundles to translate to native translators.
Try to use the translations and look at the poor layout or
inaccessible buttons or part of the text in the GUI. This
is caused by the lack of support of Asian text in Java...
The developer must manually check the translation to insert
a few spaces to help manage the multiline layout.
There's no support in Java to help the developer make it a
better way...
So correct internationlization from European to Asian
languages causes a lot of unsolved issues that can make an
application unusable in some cases with Asian text.

REPRODUCIBILITY :
This bug can be reproduced always.
(Review ID: 165231) 
======================================================================

EVALUATION This is related to work that we're doing in java.lang and java.text to support Unicode 3.1/3.2/4.0 (see RFE 4640853). The focus however seems to be on using the lang/text functionality in Swing in order to achieve good text and user interface layout for languages with non-trivial requirements. Reassigning to Swing for further evaluation. ###@###.### 2002-11-11 Name: ik75403 Date: 11/14/2002 this rfe will not be fixed fot mantis ====================================================================== Name: pzR10082 Date: 04/23/2003 Currently (since at least JDK 1.4.2), Swing uses java.text.BreakIterator to determine line break locations. This means line breaks in Swing conform to whichever Unicode standard BreakIterator supports. RFE 4640853 suggests that BreakIterator and other java.text classes use the latest Unicode standard available. ###@###.### ======================================================================

24-08-2004