JDK-8202555 : Double.toString(double) sometimes produces incorrect results
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.lang
  • Priority: P3
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 19
  • Submitted: 2018-05-02
  • Updated: 2022-09-21
  • Resolved: 2022-05-05
Related Reports
CSR :  
Relates :  
Relates :  
Description
Summary
-------

Modify the specification (Javadoc) of Double::toString(double) and Float::toString(float) to ensure a uniquely determined resulting string value in all cases.

Problem
-------

The current Javadoc specifications of the mentioned methods are somehow vague in what the resulting strings should be.

On the one hand, a strict reading leads to believe that the digits in the resulting string are drawn, from left to right, from the exact value of the argument, until the number represented by the string is near enough to the argument as to round to it according to the default IEEE 754 round-to-closest mode.

On the other hand, a more lenient interpretation of the Javadoc and the observed behavior both lead to the conclusion that the digits appearing in the result are those of an unspecified number that also rounds to the argument. While the spec makes it clear that it must be a shortest one that possibly rounds to the argument, sometimes there are more choices. The spec says nothing in these cases.
In summary, it is not always clear from which number the digits are eventually drawn. In absence of a more specific description of this number, the result is not always uniquely determined and different implementations are thus allowed to return different strings.

Solution
--------

Specify that the conversion is split in two separate stages. The first selects a unique, well specified decimal number that represents the argument and that meets the properties listed in the specification section below. The second stage then format this decimal number as a string, as specified below.

The current and the proposed specs, while different in wording, determine exactly the same resulting strings for the vast majority of cases. Their results, however, might differ where the current one is not specific enough.

Specification
-------------

Double::toString(double):

```
    /**
     * Returns a string representation of the {@code double}
     * argument. All characters mentioned below are ASCII characters.
     * <ul>
     * <li>If the argument is NaN, the result is the string
     *     "{@code NaN}".
     * <li>Otherwise, the result is a string that represents the sign and
     * magnitude (absolute value) of the argument. If the sign is negative,
     * the first character of the result is '{@code -}'
     * ({@code '\u005Cu002D'}); if the sign is positive, no sign character
     * appears in the result. As for the magnitude <i>m</i>:
     * <ul>
     * <li>If <i>m</i> is infinity, it is represented by the characters
     * {@code "Infinity"}; thus, positive infinity produces the result
     * {@code "Infinity"} and negative infinity produces the result
     * {@code "-Infinity"}.
     *
     * <li>If <i>m</i> is zero, it is represented by the characters
     * {@code "0.0"}; thus, negative zero produces the result
     * {@code "-0.0"} and positive zero produces the result
     * {@code "0.0"}.
     *
     * <li> Otherwise <i>m</i> is positive and finite.
     * It is converted to a string in two stages:
     * <ul>
     * <li> <em>Selection of a decimal</em>:
     * A well-defined decimal <i>d</i><sub><i>m</i></sub>
     * is selected to represent <i>m</i>.
     * This decimal is (almost always) the <em>shortest</em> one that
     * rounds to <i>m</i> according to the round to nearest
     * rounding policy of IEEE 754 floating-point arithmetic.
     * <li> <em>Formatting as a string</em>:
     * The decimal <i>d</i><sub><i>m</i></sub> is formatted as a string,
     * either in plain or in computerized scientific notation,
     * depending on its value.
     * </ul>
     * </ul>
     * </ul>
     *
     * <p>A <em>decimal</em> is a number of the form
     * <i>s</i>&times;10<sup><i>i</i></sup>
     * for some (unique) integers <i>s</i> &gt; 0 and <i>i</i> such that
     * <i>s</i> is not a multiple of 10.
     * These integers are the <em>significand</em> and
     * the <em>exponent</em>, respectively, of the decimal.
     * The <em>length</em> of the decimal is the (unique)
     * positive integer <i>n</i> meeting
     * 10<sup><i>n</i>-1</sup> &le; <i>s</i> &lt; 10<sup><i>n</i></sup>.
     *
     * <p>The decimal <i>d</i><sub><i>m</i></sub> for a finite positive <i>m</i>
     * is defined as follows:
     * <ul>
     * <li>Let <i>R</i> be the set of all decimals that round to <i>m</i>
     * according to the usual <em>round to nearest</em> rounding policy of
     * IEEE 754 floating-point arithmetic.
     * <li>Let <i>p</i> be the minimal length over all decimals in <i>R</i>.
     * <li>When <i>p</i> &ge; 2, let <i>T</i> be the set of all decimals
     * in <i>R</i> with length <i>p</i>.
     * Otherwise, let <i>T</i> be the set of all decimals
     * in <i>R</i> with length 1 or 2.
     * <li>Define <i>d</i><sub><i>m</i></sub> as the decimal in <i>T</i>
     * that is closest to <i>m</i>.
     * Or if there are two such decimals in <i>T</i>,
     * select the one with the even significand.
     * </ul>
     *
     * <p>The (uniquely) selected decimal <i>d</i><sub><i>m</i></sub>
     * is then formatted.
     * Let <i>s</i>, <i>i</i> and <i>n</i> be the significand, exponent and
     * length of <i>d</i><sub><i>m</i></sub>, respectively.
     * Further, let <i>e</i> = <i>n</i> + <i>i</i> - 1 and let
     * <i>s</i><sub>1</sub>&hellip;<i>s</i><sub><i>n</i></sub>
     * be the usual decimal expansion of <i>s</i>.
     * Note that <i>s</i><sub>1</sub> &ne; 0
     * and <i>s</i><sub><i>n</i></sub> &ne; 0.
     * Below, the decimal point <code>.</code> is {@code '\u005Cu002E'}
     * and the exponent indicator <code>E</code> is {@code '\u005Cu0045'}.
     * <ul>
     * <li>Case -3 &le; <i>e</i> &lt; 0:
     * <i>d</i><sub><i>m</i></sub> is formatted as
     * <code>0.0</code>&hellip;<code>0</code><!--
     * --><i>s</i><sub>1</sub>&hellip;<i>s</i><sub><i>n</i></sub>,
     * where there are exactly -(<i>n</i> + <i>i</i>) zeroes between
     * the decimal point and <i>s</i><sub>1</sub>.
     * For example, 123 &times; 10<sup>-4</sup> is formatted as
     * {@code 0.0123}.
     * <li>Case 0 &le; <i>e</i> &lt; 7:
     * <ul>
     * <li>Subcase <i>i</i> &ge; 0:
     * <i>d</i><sub><i>m</i></sub> is formatted as
     * <i>s</i><sub>1</sub>&hellip;<i>s</i><sub><i>n</i></sub><!--
     * --><code>0</code>&hellip;<code>0.0</code>,
     * where there are exactly <i>i</i> zeroes
     * between <i>s</i><sub><i>n</i></sub> and the decimal point.
     * For example, 123 &times; 10<sup>2</sup> is formatted as
     * {@code 12300.0}.
     * <li>Subcase <i>i</i> &lt; 0:
     * <i>d</i><sub><i>m</i></sub> is formatted as
     * <i>s</i><sub>1</sub>&hellip;<!--
     * --><i>s</i><sub><i>n</i>+<i>i</i></sub><code>.</code><!--
     * --><i>s</i><sub><i>n</i>+<i>i</i>+1</sub>&hellip;<!--
     * --><i>s</i><sub><i>n</i></sub>,
     * where there are exactly -<i>i</i> digits to the right of
     * the decimal point.
     * For example, 123 &times; 10<sup>-1</sup> is formatted as
     * {@code 12.3}.
     * </ul>
     * <li>Case <i>e</i> &lt; -3 or <i>e</i> &ge; 7:
     * computerized scientific notation is used to format
     * <i>d</i><sub><i>m</i></sub>.
     * Here <i>e</i> is formatted as by {@link Integer#toString(int)}.
     * <ul>
     * <li>Subcase <i>n</i> = 1:
     * <i>d</i><sub><i>m</i></sub> is formatted as
     * <i>s</i><sub>1</sub><code>.0E</code><i>e</i>.
     * For example, 1 &times; 10<sup>23</sup> is formatted as
     * {@code 1.0E23}.
     * <li>Subcase <i>n</i> &gt; 1:
     * <i>d</i><sub><i>m</i></sub> is formatted as
     * <i>s</i><sub>1</sub><code>.</code><i>s</i><sub>2</sub><!--
     * -->&hellip;<i>s</i><sub><i>n</i></sub><code>E</code><i>e</i>.
     * For example, 123 &times; 10<sup>-21</sup> is formatted as
     * {@code 1.23E-19}.
     * </ul>
     * </ul>
     *
     * <p>To create localized string representations of a floating-point
     * value, use subclasses of {@link java.text.NumberFormat}.
     *
     * @param   d   the {@code double} to be converted.
     * @return a string representation of the argument.
     */
    public static String toString(double d) {}
```

Float::toString(float):

```
    /**
     * Returns a string representation of the {@code float}
     * argument. All characters mentioned below are ASCII characters.
     * <ul>
     * <li>If the argument is NaN, the result is the string
     * "{@code NaN}".
     * <li>Otherwise, the result is a string that represents the sign and
     *     magnitude (absolute value) of the argument. If the sign is
     *     negative, the first character of the result is
     *     '{@code -}' ({@code '\u005Cu002D'}); if the sign is
     *     positive, no sign character appears in the result. As for
     *     the magnitude <i>m</i>:
     * <ul>
     * <li>If <i>m</i> is infinity, it is represented by the characters
     *     {@code "Infinity"}; thus, positive infinity produces
     *     the result {@code "Infinity"} and negative infinity
     *     produces the result {@code "-Infinity"}.
     * <li>If <i>m</i> is zero, it is represented by the characters
     *     {@code "0.0"}; thus, negative zero produces the result
     *     {@code "-0.0"} and positive zero produces the result
     *     {@code "0.0"}.
     *
     * <li> Otherwise <i>m</i> is positive and finite.
     * It is converted to a string in two stages:
     * <ul>
     * <li> <em>Selection of a decimal</em>:
     * A well-defined decimal <i>d</i><sub><i>m</i></sub>
     * is selected to represent <i>m</i>.
     * This decimal is (almost always) the <em>shortest</em> one that
     * rounds to <i>m</i> according to the round to nearest
     * rounding policy of IEEE 754 floating-point arithmetic.
     * <li> <em>Formatting as a string</em>:
     * The decimal <i>d</i><sub><i>m</i></sub> is formatted as a string,
     * either in plain or in computerized scientific notation,
     * depending on its value.
     * </ul>
     * </ul>
     * </ul>
     *
     * <p>A <em>decimal</em> is a number of the form
     * <i>s</i>&times;10<sup><i>i</i></sup>
     * for some (unique) integers <i>s</i> &gt; 0 and <i>i</i> such that
     * <i>s</i> is not a multiple of 10.
     * These integers are the <em>significand</em> and
     * the <em>exponent</em>, respectively, of the decimal.
     * The <em>length</em> of the decimal is the (unique)
     * positive integer <i>n</i> meeting
     * 10<sup><i>n</i>-1</sup> &le; <i>s</i> &lt; 10<sup><i>n</i></sup>.
     *
     * <p>The decimal <i>d</i><sub><i>m</i></sub> for a finite positive <i>m</i>
     * is defined as follows:
     * <ul>
     * <li>Let <i>R</i> be the set of all decimals that round to <i>m</i>
     * according to the usual <em>round to nearest</em> rounding policy of
     * IEEE 754 floating-point arithmetic.
     * <li>Let <i>p</i> be the minimal length over all decimals in <i>R</i>.
     * <li>When <i>p</i> &ge; 2, let <i>T</i> be the set of all decimals
     * in <i>R</i> with length <i>p</i>.
     * Otherwise, let <i>T</i> be the set of all decimals
     * in <i>R</i> with length 1 or 2.
     * <li>Define <i>d</i><sub><i>m</i></sub> as the decimal in <i>T</i>
     * that is closest to <i>m</i>.
     * Or if there are two such decimals in <i>T</i>,
     * select the one with the even significand.
     * </ul>
     *
     * <p>The (uniquely) selected decimal <i>d</i><sub><i>m</i></sub>
     * is then formatted.
     * Let <i>s</i>, <i>i</i> and <i>n</i> be the significand, exponent and
     * length of <i>d</i><sub><i>m</i></sub>, respectively.
     * Further, let <i>e</i> = <i>n</i> + <i>i</i> - 1 and let
     * <i>s</i><sub>1</sub>&hellip;<i>s</i><sub><i>n</i></sub>
     * be the usual decimal expansion of <i>s</i>.
     * Note that <i>s</i><sub>1</sub> &ne; 0
     * and <i>s</i><sub><i>n</i></sub> &ne; 0.
     * Below, the decimal point <code>.</code> is {@code '\u005Cu002E'}
     * and the exponent indicator <code>E</code> is {@code '\u005Cu0045'}.
     * <ul>
     * <li>Case -3 &le; <i>e</i> &lt; 0:
     * <i>d</i><sub><i>m</i></sub> is formatted as
     * <code>0.0</code>&hellip;<code>0</code><!--
     * --><i>s</i><sub>1</sub>&hellip;<i>s</i><sub><i>n</i></sub>,
     * where there are exactly -(<i>n</i> + <i>i</i>) zeroes between
     * the decimal point and <i>s</i><sub>1</sub>.
     * For example, 123 &times; 10<sup>-4</sup> is formatted as
     * {@code 0.0123}.
     * <li>Case 0 &le; <i>e</i> &lt; 7:
     * <ul>
     * <li>Subcase <i>i</i> &ge; 0:
     * <i>d</i><sub><i>m</i></sub> is formatted as
     * <i>s</i><sub>1</sub>&hellip;<i>s</i><sub><i>n</i></sub><!--
     * --><code>0</code>&hellip;<code>0.0</code>,
     * where there are exactly <i>i</i> zeroes
     * between <i>s</i><sub><i>n</i></sub> and the decimal point.
     * For example, 123 &times; 10<sup>2</sup> is formatted as
     * {@code 12300.0}.
     * <li>Subcase <i>i</i> &lt; 0:
     * <i>d</i><sub><i>m</i></sub> is formatted as
     * <i>s</i><sub>1</sub>&hellip;<!--
     * --><i>s</i><sub><i>n</i>+<i>i</i></sub><code>.</code><!--
     * --><i>s</i><sub><i>n</i>+<i>i</i>+1</sub>&hellip;<!--
     * --><i>s</i><sub><i>n</i></sub>,
     * where there are exactly -<i>i</i> digits to the right of
     * the decimal point.
     * For example, 123 &times; 10<sup>-1</sup> is formatted as
     * {@code 12.3}.
     * </ul>
     * <li>Case <i>e</i> &lt; -3 or <i>e</i> &ge; 7:
     * computerized scientific notation is used to format
     * <i>d</i><sub><i>m</i></sub>.
     * Here <i>e</i> is formatted as by {@link Integer#toString(int)}.
     * <ul>
     * <li>Subcase <i>n</i> = 1:
     * <i>d</i><sub><i>m</i></sub> is formatted as
     * <i>s</i><sub>1</sub><code>.0E</code><i>e</i>.
     * For example, 1 &times; 10<sup>23</sup> is formatted as
     * {@code 1.0E23}.
     * <li>Subcase <i>n</i> &gt; 1:
     * <i>d</i><sub><i>m</i></sub> is formatted as
     * <i>s</i><sub>1</sub><code>.</code><i>s</i><sub>2</sub><!--
     * -->&hellip;<i>s</i><sub><i>n</i></sub><code>E</code><i>e</i>.
     * For example, 123 &times; 10<sup>-21</sup> is formatted as
     * {@code 1.23E-19}.
     * </ul>
     * </ul>
     *
     * <p>To create localized string representations of a floating-point
     * value, use subclasses of {@link java.text.NumberFormat}.
     *
     * @param   f   the float to be converted.
     * @return a string representation of the argument.
     */
    public static String toString(float f) {}
```
Comments
An example of why the tie-breaking rule is needed is the following. Consider `double v = 123456789012345.38`. Its full decimal expansion is 123456789012345.375. Both decimals 123456789012345.37 and 123456789012345.38 lie in the rounding interval of `v`, that is, both decimals round to `v`. No shorter decimal, nor any other equally long decimal lies in the rounding interval, so these are the only decimal with minimal length that round to `v`. Both are visibly equidistant from `v`, so both are "best" candidates. The tie-breaking rule is thus triggered and chooses the "even" one, that is, decimal 123456789012345.38, which is then formatted as `"1.2345678901234538E14"` and returned as final outcome.
21-09-2022

Moving amended request to Approved for 19.
05-05-2022

A third difference, and perhaps the most visible one, is that the current spec is not clear about which decimal number it is talking. It is stated "Otherwise, the result is a string that *represents* the sign and magnitude (absolute value) of the argument. [...] As for the magnitude m: [...]" It's unclear what is meant by "represents". The most direct and unfiltered interpretation is that m = |d|. Later is it stated "How many digits must be printed for the fractional part of m or a? [...]" implicitly assuming, in this interpretation, that the digits are those of the full decimal expansion of |d|, thus outputting a prefix thereof. This is not desirable at all. For example, the full decimal expansion of the double d closest to 1.2 is d = 1.1999999999999999555910790149937383830547332763671875 The shortest prefix that rounds to d is 1.1999999999999999 which is way longer than 1.2. In other words, the vagueness of the current spec when it comes to define the decimal to output could lead to much longer strings than needed. Luckily, the current implementation behaves better. But the fact that there is a need for an interpretation about what m (thus a) stands for makes the current spec too vague.
27-04-2022

After a meeting with [~rgiulietti] and [~bpb], a few more comments on the spec text. Since the behavior with respect to zeros and infinities is not being changed, please revert those edits as part of this fix. (Those alterations can be considered separately.) To summarize the specification differences, the new spec doesn't bind the exponent as a preliminary step which allows the new spec to make better choices. Also, the new spec gives a tie-breaking rule when there are two same-length strings that would round to the same numerical value. (I assume, but have not worked through the details that the values represented by two same-same decimal strings cannot both be exacted the same distance from a binary number. If so, round to nearest-even should be able to break the second-level tie.)
27-04-2022

Moving to Provisional for JDK 19.
25-04-2022

Thanks Guy, really appreciate.
16-11-2021

Okay, I have finished my review. Looks good to me.
16-11-2021

Hi Guy, here are the slight changes, almost all suggested by other careful reviewers. They don't affect any fundamental result. * corrected a math typo in §6.3 in the 2nd bullet which, however, doesn't affect the sequel * in result 7, first bullet, a better formulation * in §8.1 added a lower bound for s for completeness (this was §9.2 in v2 dated 2020-03-16) * clarified 1st bullet, 1st and 2nd subitem in the §8.2 (was §8.1) * clarified §8.2.1 (was §8.1.1) * rewrote $9.1 to use more accurate interval arithmetic, thus extending the range of valid exponents in the results * corrected a math typo in §9.9 (was §9.10) which doesn't affect the rest * rewrote §10.1 to slightly generalize result 26 * in §12 clarified that machine assistance is required not for the proof of result 20 itself (although such a mechanized proof exists) but for the determination of epsilon there (BTW, this epsilon was confirmed independently) * improved the notation in the appendix that led to some confusion in previous versions * added illustrative drawings in §6.3 (fig 4 and 5) and in §7 (fig 6) * added 3 bibliographical refs There are also several minor stylistic changes. As mentioned, these are all slight improvements (at least, this is my hope) but no fundamental result has changed.
08-11-2021

Hi, Raffaelo, do you have a summary of the changes that have been made in the last two versions of the Schubfach article? I am going to read through v4 one more time, and it wold be helpful to know what to look for.
08-11-2021

Link to new v4 of Schubfach article (slight clarifications) https://drive.google.com/file/d/1IEeATSVnEE6TkrHlCYNY2GjaraBjOT4f
08-11-2021

Link to new version of Schubfach article https://drive.google.com/file/d/1JoQBN5igZ8bMI3ua7l5DrHx_GYsjF0Dy
02-11-2021

Switched back the status to Draft. Added the slight adjustments suggested by Brian Burkhalter on 2019-05-29.
27-10-2021

Paper describing the algorithm: https://drive.google.com/file/d/1luHhyQF9zKlM8yJ1nebU0OgVYhfC6CBN/view
08-10-2021

Moving this to finalized state. Minor changes such as I suggested above can be considered during review.
03-06-2019

Although obvious that the length "n" of the significand "d" is positive, it might be worthwhile to state it explicitly: * The <em>length</em> of the decimal is the (unique) * positive integer <i>n</i> meeting Another minor possible change is to split the inequality "d1 != 0 != dn" into "d1 != 0 and dn != 0" as in * Note that <i>d</i><sub>1</sub> &ne; 0 and <i>d</i><sub><i>n</i></sub> &ne; 0.
28-05-2019

Please include some kind of diff from the present specification for the method(s) in question; specdiff would be ideal, but a webrev would be adequate. In addition, some of the HTML usage could be amended to be more readable in the javadoc sources. {@code <} or {@literal<}. Barring that, I'd prefer to read "&lt;" over "&#x3c;". Likewise the entity name "&infin;" is more suggestive in the javadoc sources than "#x221e;".
11-05-2018