JDK-8243162 : DateTimeFormatter formatted numbers are not localized
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.time
  • Affected Version: 11.0.6
  • Priority: P4
  • Status: Resolved
  • Resolution: Not an Issue
  • OS: windows_10
  • CPU: x86_64
  • Submitted: 2020-04-18
  • Updated: 2020-05-01
  • Resolved: 2020-04-20
Related Reports
Relates :  
Description
ADDITIONAL SYSTEM INFORMATION :
AdoptOpenJDK version 11.0.6

A DESCRIPTION OF THE PROBLEM :
The java.time.format.DateTimeFormatter does not work correctly for locales that use different symbols to represent numbers.

This bug has not been detected probably because languages such as English, French, Spanish and... use the same symbols and characters to represent numbers. Languages such as Persian (also called Farsi), Arabic (a widely spoken language) and maybe some others use different symbols and Unicode characters to represent numbers:
Western numerals: 1 2 3 4 5 6 7 8 9 0
Persian numerals: �������� �������� �������� �������� �������� �������� �������� �������� �������� ��������

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
public static void main(String[] args) {
    DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy", Locale.forLanguageTag("fa"));
    System.out.println(formatter.format(LocalDate.now()));
}

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The above code should print ��������������������������������...
ACTUAL -
... but it prints 2020.

CUSTOMER SUBMITTED WORKAROUND :
The "DateFormat" class that is used to format Java legacy "Date"s works as expected:
    SimpleDateFormat dateFormatter = new SimpleDateFormat("yyyy", Locale.forLanguageTag("fa"));
    System.out.println(dateFormatter.format(new Date())); // Correctly prints ��������������������������������

It seems that only NUMBERS are not localized properly; other things seem to be working. For example:
    DateTimeFormatter formatter = DateTimeFormatter.ofPattern("MMM", Locale.forLanguageTag("fa"));
    System.out.println(formatter.format(LocalDate.now()));
... prints "����������������������������������������" (==April) correctly.

The problem may be resolved if internal methods that return numbers (such as year, month, etc) format the number with the given locale (the locale that the user specified in DateTimeFormatter). Like this:
    return NumberFormat.getNumberInstance(/*The Locale*/).format(123);

FREQUENCY : always



Comments
If number system is not covered by setting a locale by design, I am ok with it. As to the behavior of localizedBy(), the spec does refer to explicit Unicode extensions and no more than that. So I believe this can be fixed as a behavior change with a CSR, not a spec change. I will file an issue/CSR for it later.
01-05-2020

There are three different parts to locale handling in the java.time.* formatting API: * the format pattern * the locale to use for text * the locale to use for numbers By default, setting the locale of the formatter only covers the first two, not the numbering system. Changing the behaviour so that `withLocale(locale)` also sets the decimal style seems problematic, as the locale and decimal style are orthogonal settings in the design of the class. This came up before, and we added `localizedBy(locale)` method in JDK 10. ie. the correct solution for full localization should be: myDateTimeFormatter.localizedBy(userLocale); But I think we messed up the `localizedBy(locale)` method. Today, that method only takes into account unicode extensions, whereas it probably should call `Chronology.ofLocale(locale)` and `DecimalStyle.ofLocale(locale)` always. That change would seem to better match the intent of the method, which was to provide a single method to fully set the locale of the formatter. While this is behaviourally incompatible, far fewer developers will have used the method, and those that have probably wanted full localization, so would be willing to see the behavioural change as a bug fix. FWIW, a spec lawyer could probably argue that the current localizedBy implementation doesn't meet the spec (ie. its a bug fix). Here is the proposed implementation: public DateTimeFormatter localizedBy(Locale locale) { Chronology c = Chronology.ofLocale(locale); DecimalStyle ds = DecimalStyle.of(locale); String tzType = locale.getUnicodeLocaleType("tz"); ZoneId z = tzType != null ? TimeZoneNameUtility.convertLDMLShortID(tzType) .map(ZoneId::of) .orElse(zone) : zone; return new DateTimeFormatter(printerParser, locale, ds, resolverStyle, resolverFields, c, z); } if this is agreed to be a bug fix, a back port to JDK 11 would be useful.
30-04-2020

DateTimeFormatter.format/parse depends on the default locale, so format/parse consistency is not guaranteed across locales. If we regard the default number system belongs to the default locale, which j.t.DateFormat does, there is a case to support it by default.
30-04-2020

The java.time API design requirement to give consistent results without implicit context or configuration sensitivity. Developers need to explicitly think about and design where locale specific information is needed and used and to be explicit about the choice of locale.
27-04-2020

Personally I prefer the idea that by default the default number system is reflected on DateTimeFormatter instantiation, however this will cause regression on parsing digits because of the behavior change. I'd appreciate it if [~scolebourne] gives some comments here.
27-04-2020

Additional comment added by submitter: The solution seems to be explicitly setting the DecimalStyle like this (no locale overriding or if-else required): myDateTimeFormatter .withLocale(userLocale) // or .localizedBy(userLocale) .withDecimalStyle(DecimalStyle.of(userLocale)); Turns out we should tell the formatter to use this locale AND ALSO agin tell it to use the DecimalStyle of this locale. This is annoying at the very least. Why not using DecimalStyle of the locale by default like the class DateFormat does?
27-04-2020

I can see the problem here. My suggested solution is *explicitly* specify the number system "arabext" so that numbers are in Arabic system. There are default implied number system per language which is not reflected on DateTimeFormatter. If we wanted to use the default number system in DateTimeFormatter, that would be an enhancement with behavior change.
22-04-2020

Only the web application has any information about the users desired locale. It needs to propagate that information explicitly to the formatter. The code does not need to use if/then/else, it can always invoke .localizedBy(locale) and default the locale to something sensible if there is no user selected locale.
22-04-2020

Additional information from the submitter: In the comments it is stated that the DateTimeFormatter does not take the number system into account BY DEFAULT. I'm developing a web application. Now the number formatting should be based on user locale (that is to say I cannot control their locale or know their locale in advance). So how is it possible to format the numbers properly? The only way then would be to check the user locale (with if-else) every time I want to format Temporals and if the locale equals for example to "fa" then format like suggested: ... .localizedBy(Locale.forLanguageTag("fa-u-nu-arabext")); Is it possible that the DateTimeFormatter take the number system into account like DateFormat does?
22-04-2020

DateTimeFormatter does not take the number system into account by default. In order for those formatters to localize number systems, it needs to explicitly specify through localizedBy() mehtod, i.e, ofPattern("yyyy").localizedBy(Locale.forLanguageTag("fa-u-nu-arabext"))
20-04-2020

Issue is reproducible on OpenJDK 11, year is printing as 2020 instead of ������������������������������������ OS: Windows 10 OpenJDK 11+28 : Fail OpenJDK 8u41 : Fail OpenJDK 14+36: Fail
20-04-2020