Name: bsT130419 Date: 10/05/2001
java version "1.3.1"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1-b24)
Java HotSpot(TM) Client VM (build 1.3.1-b24, mixed mode)
According to the description of Double.toString(double), the resulting string
produced by this method shall be chosen so that the number of digits is as
small as possible:
"There must be at least one digit to represent the fractional part, and beyond
that as many, ***but only as many***, more digits as are needed to uniquely
distinguish the argument value from adjacent values of type double."
The following simple program shows that for the specific case coded, this is
not true.
The program first shows that, internally, 1.0E23 and 9.999999999999999E22 are
the *very same* double. In fact, as seen by the the first 2 System.out.println
() the raw bitstrings are the same.
However, converting this double to a string produces a result that is much
longer as needed to recover the original value. The conversion
produces "9.999999999999999E22" instead of the much shorter "1.0E23" which,
internally, represents the *same* value. Because "1.0E23" is shorter
than "9.999999999999999E22" and because when converted back to double both
produce the same double value, "1.0E23" *must* be output according to the
specification.
class DoubleIO {
public static void main(String[] args) {
String s0 = "1.0E23";
String s1 = "9.999999999999999E22";
double d0 = Double.valueOf(s0).doubleValue();
double d1 = Double.valueOf(s1).doubleValue();
System.out.println(Double.doubleToLongBits(d0));
System.out.println(Double.doubleToLongBits(d1));
System.out.println(s0 + " -> " + Double.toString(d0));
}
}
(Review ID: 133203)
======================================================================
###@###.### 2004-11-11 21:42:12 GMT
Commentary on a fix for this (un-edited), from Peabody community member:
A DESCRIPTION OF THE FIX :
Issues with java.lang.Double.toString()
---------------------------------------
Introduction
------------
Although reporting bugs, this document is longer than a usual bug report
due to some technical issues that require a deeper discussion. Moreover,
while I implemented a "bug fix" contribution, it is really new software
implemented from scratch, not a simple patch suite to apply to the
current source base. I will be glad to submit it to Mustang if you
decide that the issues described here are important enough to deserve
more attention.
In what follows, class names starting with a dot shall be prefixed with
java.lang: e.g. .String stands for java.lang.String. (The initial dot
prevents ambiguities with classes in unnamed packages.)
The current specification of .Double.toString(double) fails to produce
the shortest possible decimal number which can recover the original
double. This document discusses the issue and explains why producing the
shortest possible decimal is better, both from a theoretical as well as
from a practical point of view.
Moreover, several bugs in the current implementation of
.Double.toString(double) are shown.
Finally, a modified specification is proposed. The proposed
specification is accompanied by an implementation which solves all the
current bugs. In addition, it is about two times faster on the average
than the JDK implementation.
Definitions
-----------
* Exponentiation is denoted by ** to avoid confusion with Java's meaning
of ^ for xor.
* A decimal number is a real number of the form d*10**k, where d and k
are integers.
* A double is a real number whose value is in the Java double set.
* A denormal number is a double with a Java denormalized value.
* A normal number is any finite nonzero double which is not a denormal.
The discussion below is limited to nonzero finite positive numbers.
The problems with the current specification
-------------------------------------------
.Double.toString(double) is supposed to produce a .String denoting a
decimal number sufficiently close to the double argument so that the
converse transformation, as specified for example by
.Double.parseDouble(.String), can recover the original double. Moreover,
it is also supposed to produce a shortest decimal number which still can
recover the double.
.Double.toString(1e23) produces "9.999999999999999E22". In fact, this
output recovers the original double closest to 10**23 but, evidently, it
is not the shortest: "1.0E23" is quite shorter!
This is due to the current specification which fixes the exponent too
early. The premature commitment to the exponent then necessarily leads
to the long string of nines. The best exponent should be chosen while
producing the digits, not that early. (See also bug located at
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4511638 submittet
years ago by me and the relevant standard literature cited there by the
reviewer.)
.Double.toString(5e-324) produces "4.9E-324". All of 3*10**-324,
4*10**-324, 5*10**-324, 6*10**-324, 7*10**-324 can recover the original
double and they are all shorter than 4.9*10**-324. The proposed
specification would produce "5.0E-324" because, among the shortests, it
is also the nearest to the original double, which is in fact
approximatively 4.9*10**-324. (The trailing zero in "5.0E-324" is a
backward compatibility concession to the current specification. In my
opinion "5E-324" is even better: it has less characters while, at the
same time, syntactically denoting a floating point value.)
In some cases the specification is ambiguous. For example, it fails to
define which of two equally short values that both happen to recover the
original double shall be output. This underspecification could lead to
different results on different Java implementations.
Implementation issues
---------------------
The following list shows some examples that have been detected on
jdk1.5.0_05 and earlier releases as well as on the most recent Mustang
(build 1.6.0-ea-b56 of 2005-10-13). Conceptually, the arrow stands for
the double conversion decimal -> double -> decimal.
* Spurious unnecessary trailing 0. Note that this is not a mathematical
issue: the numbers on both sides of the arrow are the same.
0.001 --> 0.0010
0.002 --> 0.0020
0.003 --> 0.0030
* String of 9s.
8.41E21 --> 8.409999999999999E21
2.0E23 --> 1.9999999999999998E23
8.962E21 --> 8.961999999999999E21
* String of 0s.
7.3879E20 --> 7.387900000000001E20
3.1E22 --> 3.1000000000000002E22
5.63E21 --> 5.630000000000001E21
* 18 digits, even though 17 digits are *always* enough (see Matula's
papers cited in the references)
2.82879384806159E17 --> 2.82879384806159008E17
1.387364135037754E18 --> 1.38736413503775411E18
1.45800632428665E17 --> 1.45800632428664992E17
* 5 digits too much.
1.790086667993E18 --> 1.79008666799299994E18
2.273317134858E18 --> 2.27331713485799987E18
7.68905065813E17 --> 7.6890506581299994E17
* Not the closest to the intermediate double, left number is closer.
1.9400994884341945E25 --> 1.9400994884341944E25
3.6131332396758635E25 --> 3.6131332396758634E25
2.5138990223946153E25 --> 2.5138990223946152E25
* Among the powers of 2, there are more than 17% that are output as
unnecessarily long numbers, the worst case being 2**959:
4.8726570057E288 --> 4.8726570056999995E288
which is 6 digits longer than needed.
It is quite unfortunate that some of the cases presented above can be
produced in tons. All of them have not been found by clever analysis but
by really simple programs. On the good news, decimal outputs that
couldn't recover the original doubles have not been detected.
Advocating for a better specification
-------------------------------------
What is wrong with the "9.999999999999999E22" output? From the
perspective of a human user that types "1e23" and gets
"9.999999999999999E22" as feedback, this can be annoying, to say the
least. The system responds with a complicated number to a simple input.
In the late sixties, Matula published some results about floating-point
conversions (see references). One of his results states that there exist
conversions from decimal to binary and back to decimal which always
recover the very same decimal, provided it has no more than 15 digits
(and provided that the intermediate double is normal). Among many other
things, this means that 1e23 shall be output as "1.0E23".
(Although Matula limited the discussion to truncation and one form of
rounding conversion, the results can be extended to more general
roundings, including IEEE 754. See
http://homepage.sunrise.ch/mysunrise/r.giulietti/Matula.pdf for a short
discussion.)
By specifying that .Double.toString(double) shall produce the shortest
decimal that can recover the original double, Matula's result is
implicitly guaranteed.
In other words, if .Double.toString(double) is required to produce the
shortest decimal, then the decimal -> binary -> decimal is the identity
(as far as the numerical value is concerned), provided the original
decimal has no more than 15 digits and provided that the intermediate
double is normal.
In practical terms, this means that the vast majority of user inputs are
guaranteed to be output with the same numerical value. All physical
constants known to me, for example, can be input to the system and are
guaranteed to come out with the same numerical value.
This regularity and the "no surprise" side effects for the user, both
based on Matula's sound results, are the driving reasons to modify the
specification and to adopt the proposed one.
Another advantage of the proposed specification is that it clearly
separates the definition of the best decimal number from its
representation as .String. Moreover, the specified decimal number is
unique: there are sufficient rules to ensure that the conversion is
unambiguous. This contrasts with the current specification which is
ambiguous in the case that there is more than one "shortest" decimal.
Such underspecifications fail to ensure identity of the decimal ->
binary -> decimal conversion, where applicable. Moreover, they could
lead to different results on different implementations.
The proposed specification
--------------------------
Because this is longer than a usual contribution, the proposed
specification is located at
http://homepage.sunrise.ch/mysunrise/r.giulietti/Double.html in the form
of a Javadoc.
The accompanying implementation
-------------------------------
To solve the bugs detected in the JDK, I wrote a new implementation of
.Double.toString(double) from scratch. Because it mirrors the proposed
specification instead of the current one, it only makes sense to submit
the implementation to Mustang if the spirit of the specification, if not
its exact wording, is accepted as well.
The implementation has the following features:
* It solves all the problems above.
* It is pure Java. It uses only JDK's public APIs in java.lang and
java.math and some self written supporting classes.
* It is threadsafe as long as the used API's are threadsafe.
* It has been extensively tested. All outputs up to 9 digits have been
found to be correct according to the proposed specification. Many days
of computing have been devoted to test other billions of randomly
generated doubles for correctness of the outputs. All boundary cases
have been tested (powers of 10, powers of 2, Paxson's test suite,
implementation dependent boundary cases).
* And last but not least, it performs twice as fast on the average, and
for long outputs is up to 4 times faster than the JDK.
References
----------
D. W. Matula, "The Base Conversion Theorem", Proc. AMS, 19 (1968) p.
716-723
D. W. Matula, "In-and-Out Conversions", Communications of the ACM, 11
(1968) p. 47-50