JDK-8130845 : Change to CLDR Locale data in JDK 9 b71 causes SimpleDateFormat parsing errors
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util:i18n
  • Affected Version: 9
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: linux
  • CPU: x86
  • Submitted: 2015-07-09
  • Updated: 2016-08-24
  • Resolved: 2015-08-04
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9 b77Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
FULL PRODUCT VERSION :
java version "1.9.0-ea"
Java(TM) SE Runtime Environment (build 1.9.0-ea-b71)
Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b71, mixed mode)

ADDITIONAL OS VERSION INFORMATION :
OS independent

A DESCRIPTION OF THE PROBLEM :
Build 71 started to use the CLDR locale data by default. This causes several problems when parsing dates (in some locales, especially the neutrale Locale.ROOT)

REGRESSION.  Last worked in version 9

ADDITIONAL REGRESSION INFORMATION: 
java version "1.9.0-ea"
Java(TM) SE Runtime Environment (build 1.9.0-ea-b71)
Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b71, mixed mode)

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Apache Lucene / Solr needs to parse dates from strings. Because Solr is language neutral, it uses Locale.ROOT when parsing dates.

Any date that contains time zone identifiers or weekdays cannot be parsed anymore in this Locale. JDK-8129881 mentions that some locales are missing this information, so this would explain this. I think this is exactly the same problem.

In addition, the system is not even able to parse new Date().toString() using the ROOT Locale.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Date should parse with SimpleDateFormat
ACTUAL -
Exception, date cannot be parsed with ROOT locale

ERROR MESSAGES/STACK TRACES THAT OCCUR :
Exception in thread "main" java.text.ParseException: Unparseable date: "Thu Nov 13 04:35:51 AKST 2008"
        at java.text.DateFormat.parse(DateFormat.java:366)
        at Bug.main(Bug.java:9)

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Locale;

public final class Bug {
 
  public static void main(String[] args) throws ParseException {
    SimpleDateFormat fmt = new SimpleDateFormat("EEE MMM d hh:mm:ss z yyyy", Locale.ROOT);
    fmt.parse("Thu Nov 13 04:35:51 AKST 2008");
  }
  
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Pass -Djava.locale.providers=JRE,SPI to use old locale date as of JDK 7/8. Alternatively use Locale.ENGLISH.


Comments
There are two issues involved in this parsing failure in the root locale. - CLDR's root locale does not provide translations (even English) as it is supposed to be language independent. So parsing month names in long format (e.g, "Jan"/"Feb" etc) actually fails which is actually the correct behavior. - CLDR does not provide three/four letter time zone abbreviated names. So parsing those (in this case, "AKST") should fail. However, we've decided to change the above behavior for the backward compatibility. 1) For the first issue, supply English month names even in the root locale, i.e., "Jan" for "M01", "Feb" for "M02" and so on. 2) If the short time zone names are missing, substitute short names in the CLDR resource bundles with the corresponding ones from JRE.
03-08-2015

Bug logged by Apache Lucene - additional comment from Uwe below: I think the real issue here is the following (Rory can you add this to issue?): According to Unicode, all locales should fall back to the ROOT locale, if the specific Locale does not have data (e.g., http://cldr.unicode.org/development/development-process/design-proposals/generic-calendar-data). The problem is now that the CLDR Java implementation seems to fall back to the root locale, but the root locale does not have weekdays and time zone short names - our test verifies this: ROOT locale is missing all this information. This causes all the bugs, also the one in https://bugs.openjdk.java.net/browse/JDK-8129881. The root locale should have the default English weekday and timezone names (see http://cldr.unicode.org/development/development-process/design-proposals/generic-calendar-data). I think the ROOT locale and the fallback mechanism should be revisited in JDK's CLDR impl, there seems to be a bug with that (either missing data or the fallback to defaults does not work correctly). Uwe
09-07-2015