Bug ID: JDK-4210199 RFE: Numerals are always Arabic (Roman)

Type: Enhancement
Component: client-libs
Sub-Component: 2d
Affected Version: 1.2.0

Priority: P4
Status: Resolved
Resolution: Fixed
OS: generic
CPU: generic

Submitted: 1999-02-09
Updated: 2000-10-25
Resolved: 2000-10-25

Other
1.4.0 betaFixed


Name: js5519			Date: 02/09/99


When numerals are in an Arabic context, for example when they are 
surrounded by Arabic letters, they should have Hindi shapes (Unicode
values from \u0660 to \u0669). Currently the BIDI algorithm always
sets numeral shapes to the Arabic (Roman) shapes (Unicode values
\u0030 to \u0039). Shaping the numbers is the responsibility of the
BIDI algorithm as specified by the Unicode standard.
Note that shaping the numbers should only happen in Arabic blocks 
and not in Hebrew blocks, since Hebrew always uses the Roman numerals.

This is very important because the Hindi numerals are the only numerals
known in most of the Arab countries, especially in the Gulf region.

I suggest that there should be an attribute in the TextAtribute class 
as follows:
TextAttribute.NUMERALS_SHAPE
and it could be set to:
TextAttribute.NUMERALS_SHAPE_ROMAN //numerals are always Roman
TextAttribute.NUMERALS_SHAPE_HINDI //numerals are always Hindi
TextAttribute.NUMERALS_SHAPE_CONTEXT //the BIDI algorithm will shape the numerals depending on the context they're in
(Review ID: 53978)
======================================================================

CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: merlin-beta FIXED IN: merlin-beta INTEGRATED IN: merlin-beta

14-06-2004

SUGGESTED FIX The most direct solution is to provide an attribute, NumericConversion, that causes the ASCII values from 0-9 to be converted to some other range, the base of which is the unicode character corresponding to the integer value of the attribute. It would be the caller's responsibility to apply this attribute with the desired value for those ranges of text they cared about. It would only affect ASCII digits, so it could be applied 'indiscriminately' to entire ranges of text. My understanding is that some locales that use Arabic (Morocco) do not use the digits from the Arabic block, but prefer that they remain ASCII. Other locales might prefer the Persian digits (a different range of the Arabic block). Rather than introduce locale-sensitive operations, and the requirement that the locale of the text be identified and the appropriate operations associated with that locale, it is more straightforward to provide a direct mechanism. We can use negative values for the attribute in the future if we want to provide other semantics similar to the ones requested. The attribute will do a simple text substitution before bidi analysis is performed on the text. Thus applying this style will be equivalent to rendering text on which the character substitutions had been performed. Nothing prevents users from performing this conversion directly if they desire. Swing text components could implement the desired semantic when in the proper locale by applying this style on the appropriate ranges of text. That would be a separate bug to file against swing. I believe there is already a request for swing text to do this. At this point I recommend against implementing the Unicode control codes that affect numeric shaping, since Unicode doesn't actually define what range of codes to map the ASCII digits to, or any mechanism to select that range. doug.felt@eng 2000-02-07 After talking with Brian Beck we determined that being able to shape to other decimal ranges would be useful too, even though the initial request was for Arabic. We also determined that options for contextual shaping and initial context (in effect before the start of the text) were required. This leads to many possible options, too many to comfortably express via a fixed set of constants. So we recommend making the NumericShaper class and its operations public. doug.felt@eng 2000-09-20

20-09-2000

EVALUATION Must address this in the TextLayout and Swing bidi algorithms. parry.kejriwal@eng 1999-06-30 The problem is that the clients want to continue using the ascii numerals and not convert to the Hindi numerals themselves. The arguments are 1) client data from external sources, while nominally 'unicode', stores numeric data as ascii because their software can't handle the other numerals in unicode; 2) keyboards for Arabic don't generate the numerals in the Hindi block, but instead generate the numerals in the roman block, making it difficult to enter the correct text using 'off the shelf' components such as the swing text components. The second problem really shouldn't be the client's responsibility. While unicode includes codes to turn on 'national' digit shapes' these are deprecated (because they are stateful) and we don't support them. Since we rely on the OS for keyboard support, changing the keyboard handling for all platforms is rather error-prone, though iy is an option. And we still face client reluctance to convert their numeric data. Some form of Attribute support could handle this. There are lots of numerals, and even in the Arabic block there are different sets of numerals for different languages (the Persian digits at 06f0), so it could be argued (I would) that the numeric shaping should be language-dependent, and that instead of having explicit attributes for each number type as well as a bidi-contextual form, there should be only 'explicit' and 'language dependent', where 'language dependent' depends on either explicit language tagging (more attributes) or language analysis. 'Explicit' is the default and how unicode prefers to handle things. 'Language dependent' would trigger examining an attribute for language. If it is not present we synthesize a language based on paragraph context (we do script analysis for OpenType anyway). If the result is 'arabic' we use the numerals in the standard arabic block, if it is 'persian' we use the numerals from the extended arabic block, etc. Unfortunately some writing systems write numbers differently-- not just with different numerals-- though many just use variant character forms. This wouldn't handle that, though it might lead to the expectation that it would-- for instance, that roman numbers embedded in Chinese would use a traditional Chinese representation (X 100 Y 10 Z) instead of X Y Z. So I think this needs a bit more investigation. doug.felt@eng 1999-07-07 The reporter is incorrect in stating that "Shaping the numbers is the responsibility of the BIDI algorithm as specified by the Unicode standard." This is not the case, the Bidi algorithm only deals with character positioning and not shaping. Shaping is the responsibility of the rendering system and is outside Unicode's domain per se. That said, we do perform shaping of Arabic text, and also lam-alef ligature substitution. See the suggested fix. doug.felt@eng 2000-02-07 The simplest thing to do is to add a new attribute and perform contextual shaping based on a small set of fixed values. If other people want contextual shaping, we can expand the set of values. Clients want contextual shaping and a generic implementation is probably overkill, and if not carefully designed could be easily abused. doug.felt@eng 2000-04-25

25-04-2000