JDK-8188147 : Compact Number Formatting support
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.text
  • Priority: P4
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 12
  • Submitted: 2017-09-29
  • Updated: 2019-01-16
  • Resolved: 2018-11-29
Related Reports
CSR :  
Relates :  
Description
Summary
-------
Adding support for the compact/short number formatting in JDK.

Problem
-------

The existing NumberFormat APIs provide support for formatting/parsing general purpose numbers e.g. decimal, currency, and percentage, but the support for formatting in compact forms of numbers is not available.

Solution
--------
Add the support for formatting the numbers in compact forms. Each locale has different compact forms for representing a number, hence a number can be formatted in multiple ways across locales.

For Example:

 - 1000 can be formatted as "1K", and 1000000 as 1M in "en_US" locale
 - 1000 can be formatted as "1 ���������������", and 50000000 as "5 ���." In "hi_IN" locale

CLDR provides patterns for compact number formatting. These resources can be utilized to add locale specific compact number formatting support.

Following major public APIs are introduced for compact number formatting feature:

CompactNumberFormat class targeted for compact number formatting

    public class CompactNumberFormat extends NumberFormat

NumberFormat.Style enum for specifying the format style

    public static enum NumberFormat.Style

Addition of two fields in "java.text.NumberFormat.Field" which can be used to track the position of prefix and suffix in the resulting output.

    public static final NumberFormat.Field PREFIX
    public static final NumberFormat.Field SUFFIX

public factory methods in NumberFormat.java to obtain CompactNumberFormat instance

    public static NumberFormat getCompactNumberInstance()
    public static NumberFormat getCompactNumberInstance(Locale locale,
                                                 NumberFormat.Style formatStyle)

Provider method in java/text/spi/NumberFormatProvider

    public NumberFormat getCompactNumberInstance(Locale locale, NumberFormat.Style formatStyle)

Specification
-------------

http://cr.openjdk.java.net/~arapte/nishjain/8177552/specdiff_cnf.21/overview-summary.html

Serialized form

http://cr.openjdk.java.net/~arapte/nishjain/8177552/specdiff_cnf.21/serialized-form.html#java.text.CompactNumberFormat




Comments
Moving the request to Approved.
29-11-2018

Changes made in specdiff_cnf.21 - Added a statement about exponential parsing in CompactNumberFormat.parse(), as the exponent parsing differs between DecimalFormat and CompactNumberFormat. The DecimalFormat parses exponential numbers irrespective of the type of format e.g. general number format, currency format or scientific format, but the CompactNumberFormat is not expected to parse the exponent number strings. Also, It is not practical to expect a string which include both scientific notation and compact form e.g. "1.05E4K", as the objective of compact format is to make the formatted string compact and human readable. Statement added in CNF.parse(): "CompactNumberFormat parse does not allow parsing scientific notations. For example, parsing a string "1.05E4K" in US locale breaks at character 'E' and returns 1.05." Made some other small changes, not sure if that needs to be mentioned in the CSR, but below are the changes - Taken out getter methods for min/max integer/fraction digits as they were simply calling respective super class methods and also had the same javadoc. (getMinimumIntegerDigits, getMaximumIntegerDigits, getMinimumFractionDigits, getMaximumFractionDigits) - A word change in the constructor parameter "decimalPattern" "default number formatting -> general number formatting" as "general number formatting" sounds more inline with the spec mentioned about "0" (special pattern) in the "Formatting" section.
28-11-2018

Moving to Approved.
16-11-2018

Updated the specdiff - Changed the clone method declaration to return CompactNumberFormat - Changed the @since tag of NumberFormat.Style from 11 to 12
16-11-2018

The overridden clone method in the "19" iteration is *not* declared to return CompactNumberFormat, which was the previous suggestion. Please check that all @since tags in this change are set appropriately to 12 rather than 11. Please re-finalize when these amendments are made.
16-11-2018

Updated the specdiff based on the comments Changes made - Grouped semicolon and NegativePattern as optional entity [; NegativePattern]optional - Override clone() method The intention of compact number formatting is to format a number in a shorter/compact form (with no fraction digits), so setting the min/max integer/fraction digit larger than 309/340 may not be needed. Also, instead of complicating it with multiple min/max integer/fraction digit values (separate for double/long and BigDecimal/BigInteger), corresponding values of double type are taken which are 309 and 340.
13-11-2018

Pending until previous questions and comments are addressed.
02-11-2018

Is it intentional that the grammar: "A compact pattern has the following syntax: Pattern: PositivePattern PositivePattern ; NegativePattern_optional allows a semi-colon after a positive pattern *without* a negative pattern being present? Presumably the whole "; NegativePattern" structure should be optional not just the "NegativePattern" portion. (I noticed CompactNumberFormat is Cloneable and isn't currently defined to have a covariant override of clone to return CompactNumberFormat rather than Object. This is generally better for users and I don't think has any compatibility concerns in this case.) An observation, if BigDecimal and BigInteger an intended to be supported, having maximum/minimum integer and fraction digits corresponding to the double type is odd. From the format method "The number [to be formatted] can be of any subclass of Number."
01-11-2018

Changes made in the specdiff_cnf.18 Changes made based on the comments for groupingSize: - A range check is applied which allows value within the range >= 0 and <=127 else IAE is thrown, as byte allows 127 as max value. - Updated readObject() to throw InvalidObjectException on invalid grouping size - Added a statement on "byte groupingSize" instance field that it must not be negative. To make CompactNumberFormat.parse() method consistent with DecimalFormat.parse(), some changes are made w.r.t. the type of the value returned by CompactNumberFormat.parse() method: - Added setParseBigDecimal() and isParseBigDecimal() methods. - Added parseBigDecimal field in the serialized form - Updated the return type section in the specification of CompactNumberFormat.parse()
24-10-2018

As a general comment for future reference, if there are deliberate anachronisms or other unexpected design choices being made in an API proposal, it is appropriate for the submitter to explicitly discuss those in the Solution section of the CSR so that reviewers do not have to discover and ask about each such design choice.
16-10-2018

[~darcy] I agree to the last part of the comment (even if "int" is okay), that the parameter value should be checked and IllegalArgumentException should be thrown for unacceptable value. e.g. no negative values should not be allowed However DecimalFormat.setGroupingSize(int) allows setting negative values, and it works fine, because the position where a grouping separator is inserted is calculated based on the mod operation ((position % groupingSize == 0)) so, being it negative or positive does not affect (not sure if it can fail on some special value), but groupingSize holding a negative value does not sound logically correct. I will update the CompactNumberFormat spec for the value check, but please suggest on the other part, if keeping the method prototype (setGroupingSize(int)) consistent with DecimalFormat is okay or should it be changed too?
16-10-2018

It is kept like this to make it consistent with setGroupingSize of the DecimalFormat API. However I am not sure about the history of DecimalFormat.setGroupingSize(int) taking an int and coverting it to byte.
16-10-2018

Why does setGroupingSize��� contain the following spec: Sets the grouping size. Grouping size is the number of digits between grouping separators in the integer portion of a number. For example, in the compact number "12,347 trillion" for the US locale , the grouping size is 3. *The value passed in is converted to a byte, which may lose information. * If you want to make the parameter a byte, make it a byte. If you want to cap the value from 0 to 255, throw an IllegalArgumentException for out of bounds information. Pending the request.
16-10-2018

Changes made in the specdiff_cnf.15: As suggested, replaced List<String> with String[] for the compact patterns (in serialized form and constructor parameter). No other major change except in wordings w.r.t. compactPatterns field at constructor and readObject() ("list" -> "array")
11-10-2018

As List is an interface, there are multiple valid implementation types possible for the List and multiple possible hostile types. Using an array in the serial form may present fewer attack routes to defend when the stream is deserialized.
10-10-2018

Sorry didn't get the point, what is it which needs to be considered explicitly in case of List, which is not there for String[]?
10-10-2018

Has the use of List<String> for compactPatterns in the serial form rather than a String[] been explicitly considered? Marking this request as pended in the interim.
10-10-2018

Changes made to the earlier appoved specification (earlier approved specdiff: specdiff_cnf.10) 1. In the earlier specification, consideration of the position of prefix/suffix fields of the formatted output e.g. "K" in "5K" was missing - Added/Override the "formatToCharacterIterator" method��� in CompactNumberFormat. As per the java.text.Format.formatToCharacterIterator���() method spec "Subclasses that support fields should override this and create an AttributedCharacterIterator with meaningful attributes." - Added two new fields PREFIX and SUFFIX in "java.text.NumberFormat.Field" which can be used to track the position of prefix and suffix in the resulting output. - Updated the third parameter (FieldPosition fieldPosition) description for all CompactNumberFormat.format methods about the usage of NumberFormat.Field.PREFIX and NumberFormat.Field.SUFFIX 2. The earlier specification didn't consider the possibility of the presence of explicit negative subpattern in a compact pattern. - Under heading, "Compact Number Patterns", updated the compact number pattern syntax to "PositivePattern ; NegativePattern[optional]" which now allows providing explicit negative subpattern. - The two paragraphs (next to compact pattern syntax) explaining about the compact patterns are also modified. <p> > "A compact pattern contains a positive and negative subpattern separated by a subpattern boundary character..." <p> > "Many characters in a compact pattern are taken literally,..."
02-10-2018

Before this request is re-finalized, please include a description of the changes from the previously approved version.
28-09-2018

Having the static factories return the parent class is suboptimal, but given other historical design choices in the API, is acceptable. Moving to Approved.
23-07-2018

If an external SPI implementation decides to provide and use its own implementation for the compact number format, returning a CompactNumberFormat from static getCompactNumberInstance may not allow using its own implemented class. NumberFormat being an abstract class of all number formatting can serve the purpose.
20-07-2018

Shouldn't the static getCompactNumberInstance() methods added in NumberFormat be declared to return a CompactNumberFormat rather than a NumberFormat? Changing the fixVersion to 12.
19-07-2018

Made the suggested updates in specdiff_cnf.10
19-07-2018

Should the readObject method of CompactNumberFormat��� state how it calls (or doesn't call) the readObject method of NumberFormat? In particular, the check NumberFormat.readObject: "If minimumIntegerDigits is greater than maximumIntegerDigits or minimumFractionDigits is greater than maximumFractionDigits, then the stream data is invalid and this method throws an InvalidObjectException. In addition, if any of these values is negative, then this method throws an InvalidObjectException." is not covered by the current spec of CompactNumberFormat���.readObject. Since the CompactNumberFormat class is final, it doesn't have to worry about the equals/hashCode interactions of subclasses. I'd prefer the equals/hashCode method to specify that the exact algorithms for equals and hashCode is unspecified. Something like "Checks if this CompactNumberFormat is equal to the specified obj. The objects of type CompactNumberFormat are compared, other types return false; obey the general contract of Object.equals." Moving to Provisional.
18-07-2018

When you're ready for the CSR to review a new iteration, please move the CSR to Proposed state.
11-07-2018

Checked the "equals" methods between the NumberFormat and its subclasses for violations of the symmetry and transitivity properties, didn't find any issue with that. One of the reason is that the NumberFormat is not an instantiable class. Changes made to the specdiff_cnf.09: - Removed component based comparison description in equals and hashCode - Added checks in the readObject method specification. - Updated the specification of digit counts setXXX about the max allowed integer and fraction digits. - Removed a statement from digits counts getXXX specification, which specified that the digit count used for formatting and the returned value can be different. - Updated the groupingSize field specification w.r.t. the relation with grouping used. - Changed groupingSize to "byte", earlier it was an "int". - Added a statement to the reference fields of CNF saying "This field must not be null."
28-06-2018

Looking over the equals methods in java.text.NumberFormat and its child classes ChoiceFormat and DecimalFormat, it is likely that violations of the symmetry and transitivity properties can be constructed; see the relevant item in "Effective Java" for a discussion of writing equals methods. The NumberFormat.equals method uses a getClass-based implementation that compares the implementation class as well as the fields defined in NumberFormat. Please re-examine the equals/hashCode methods here with an eye toward making sure they interoperate correctly with other subclasses of NumberFormat. The readObject method seems underspecified since it does not describe the checks it does; presumably the same checks as the constructor.
15-06-2018

Please review the wording of equals/hashCode method in this package and elsewhere. If the class is final, there is less need to specify the details of what is compared (as long as the general contract of equals/hashCode is met of course). It is easy to write an incorrect equals method that delegated to the parent equals; this can violate the symmetry or transitivity property for example. Hence the current wording is worrisome. I don't see any material in this CSR which specifies the serialized form.
11-06-2018

The equals/hashCode changes don't seem necessary or correct to me. The other concrete NumberFormat subclasses seem to use (or at least allow) identity-as-equality. - The NumberFormat class has "min/max integer/fraction digits", "grouping used", "parse integer only" which are inherited to CompactNumberFormat, the fields/properties which are introduced in the CompactNumberFormat are "decimal pattern", "decimal format symbols", "compact patterns", "rounding mode" and "grouping size". These together represent the state of a CompactNumberFormat object and hence equality is checked on them. Agree that the changes in the hashCode are not necessary, but this is just to be consistent with the equals method. What is meant by "and the comparison is based on the super class equality" ? - In CompactNumberFormat equals() method super.equals() is used to check the fields/properties inherited from NumberFormat, which are "min/max integer/fraction digits", "grouping used", "parse integer only". So, this statement is meant for that. As the CompactNumberFormat class is final, its hashCode components strictly speaking don't need to be specified, but must be consistent with equals as usual. - Do you mean to remove "The hash code of this instance is computed as a function of the hash code of the superclass, decimal patterns, decimal format symbols, compact number patterns, rounding mode and grouping size." ? I don't see a discussion of the serialization form included in the specdiff or other materials. - Didn't get this point.
11-06-2018

Pending this request. The equals/hashCode changes don't seem necessary or correct to me. The other concrete NumberFormat subclasses seem to use (or at least allow) identity-as-equality. What is meant by "and the comparison is based on the super class equality" ? As the CompactNumberFormat class is final, its hashCode components strictly speaking don't need to be specified, but must be consistent with equals as usual. I don't see a discussion of the serialization form included in the specdiff or other materials.
08-06-2018

Changes made to the earlier specification 1. Change in the serialization form 2. Change in the equals() method specification (removed min/max fraction/integer digits, "grouping used", and "parse integer only" as they are already checked in the superclass) 3. Change in the hashCode() method specification. (added "rounding mode" and "grouping size" in the hashcode function) 4. Added text about parsing behaviour at class level.
05-06-2018

Approving for JDK 11.
06-04-2018

CompactNumberFormat is an implementation of LDML's compact formatting, while NumberFormat is an abstraction of all number formatting. Consider a hypothetical case where someone wants to implement the compact formatting unrelated to LDML's compact formatting. It may still want to use the styles, such as LONG/SHORT. Thus these style enums are defined in NumberFormat.Style.
06-04-2018

I assume this work is targeted at JDK 11. CompactNumberFormat is a respectable class like NumberFormat. It still seems odd to me to place the Style enum inside of NumberFormat when it is just there to configure CompactNumberFormat instances. Naoto, do you have an opinion on this point? Thanks.
06-04-2018

Why constructor is required? > The constructor is mainly required to obtain CompactNumberFormat instance by an SPI implementation, especially JDK's sun.util.locale.provider.NumberFormatProviderImpl. Changing "...both parse to Long(17000)" with "...both parse to Long.valueOf(17000)" > OK Should the NumberFormat.Style enum as a nested type be defined inside of CompactNumberFormat rather than NumberFormat? > In case an external SPI implementation decide to provide their own CompactNumberFormat implementation, it may not be the right expectation from an SPI to require an enum of a concrete subclass, hence it is moved to NumberFormat.
05-04-2018

Moving to Provisional; some comments for consideration: Is the constructor of CompactNumberFormat necessary or just conventional for NumberFormats? I suppose the symbol and pattern settings are not accessible from the existing factory method. For CompactNumberFormat.parse, I recommend replacing the test "...both parse to Long(17000)" with "...both parse to Long.valueOf(17000)" to avoid an implicit promise around object identity guarantees. Should the NumberFormat.Style enum as a nested type be defined inside of CompactNumberFormat rather than NumberFormat?
04-04-2018