JDK 19 |
---|
19 b26Fixed |
CSR :
|
|
Duplicate :
|
|
Duplicate :
|
|
Duplicate :
|
|
Duplicate :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
JDK-8291475 :
|
Name: bsT130419 Date: 10/05/2001 java version "1.3.1" Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1-b24) Java HotSpot(TM) Client VM (build 1.3.1-b24, mixed mode) According to the description of Double.toString(double), the resulting string produced by this method shall be chosen so that the number of digits is as small as possible: "There must be at least one digit to represent the fractional part, and beyond that as many, ***but only as many***, more digits as are needed to uniquely distinguish the argument value from adjacent values of type double." The following simple program shows that for the specific case coded, this is not true. The program first shows that, internally, 1.0E23 and 9.999999999999999E22 are the *very same* double. In fact, as seen by the the first 2 System.out.println () the raw bitstrings are the same. However, converting this double to a string produces a result that is much longer as needed to recover the original value. The conversion produces "9.999999999999999E22" instead of the much shorter "1.0E23" which, internally, represents the *same* value. Because "1.0E23" is shorter than "9.999999999999999E22" and because when converted back to double both produce the same double value, "1.0E23" *must* be output according to the specification. class DoubleIO { public static void main(String[] args) { String s0 = "1.0E23"; String s1 = "9.999999999999999E22"; double d0 = Double.valueOf(s0).doubleValue(); double d1 = Double.valueOf(s1).doubleValue(); System.out.println(Double.doubleToLongBits(d0)); System.out.println(Double.doubleToLongBits(d1)); System.out.println(s0 + " -> " + Double.toString(d0)); } } (Review ID: 133203) ====================================================================== ###@###.### 2004-11-11 21:42:12 GMT Commentary on a fix for this (un-edited), from Peabody community member: A DESCRIPTION OF THE FIX : Issues with java.lang.Double.toString() --------------------------------------- Introduction ------------ Although reporting bugs, this document is longer than a usual bug report due to some technical issues that require a deeper discussion. Moreover, while I implemented a "bug fix" contribution, it is really new software implemented from scratch, not a simple patch suite to apply to the current source base. I will be glad to submit it to Mustang if you decide that the issues described here are important enough to deserve more attention. In what follows, class names starting with a dot shall be prefixed with java.lang: e.g. .String stands for java.lang.String. (The initial dot prevents ambiguities with classes in unnamed packages.) The current specification of .Double.toString(double) fails to produce the shortest possible decimal number which can recover the original double. This document discusses the issue and explains why producing the shortest possible decimal is better, both from a theoretical as well as from a practical point of view. Moreover, several bugs in the current implementation of .Double.toString(double) are shown. Finally, a modified specification is proposed. The proposed specification is accompanied by an implementation which solves all the current bugs. In addition, it is about two times faster on the average than the JDK implementation. Definitions ----------- * Exponentiation is denoted by ** to avoid confusion with Java's meaning of ^ for xor. * A decimal number is a real number of the form d*10**k, where d and k are integers. * A double is a real number whose value is in the Java double set. * A denormal number is a double with a Java denormalized value. * A normal number is any finite nonzero double which is not a denormal. The discussion below is limited to nonzero finite positive numbers. The problems with the current specification ------------------------------------------- .Double.toString(double) is supposed to produce a .String denoting a decimal number sufficiently close to the double argument so that the converse transformation, as specified for example by .Double.parseDouble(.String), can recover the original double. Moreover, it is also supposed to produce a shortest decimal number which still can recover the double. .Double.toString(1e23) produces "9.999999999999999E22". In fact, this output recovers the original double closest to 10**23 but, evidently, it is not the shortest: "1.0E23" is quite shorter! This is due to the current specification which fixes the exponent too early. The premature commitment to the exponent then necessarily leads to the long string of nines. The best exponent should be chosen while producing the digits, not that early. (See also bug located at http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4511638 submittet years ago by me and the relevant standard literature cited there by the reviewer.) .Double.toString(5e-324) produces "4.9E-324". All of 3*10**-324, 4*10**-324, 5*10**-324, 6*10**-324, 7*10**-324 can recover the original double and they are all shorter than 4.9*10**-324. The proposed specification would produce "5.0E-324" because, among the shortests, it is also the nearest to the original double, which is in fact approximatively 4.9*10**-324. (The trailing zero in "5.0E-324" is a backward compatibility concession to the current specification. In my opinion "5E-324" is even better: it has less characters while, at the same time, syntactically denoting a floating point value.) In some cases the specification is ambiguous. For example, it fails to define which of two equally short values that both happen to recover the original double shall be output. This underspecification could lead to different results on different Java implementations. Implementation issues --------------------- The following list shows some examples that have been detected on jdk1.5.0_05 and earlier releases as well as on the most recent Mustang (build 1.6.0-ea-b56 of 2005-10-13). Conceptually, the arrow stands for the double conversion decimal -> double -> decimal. * Spurious unnecessary trailing 0. Note that this is not a mathematical issue: the numbers on both sides of the arrow are the same. 0.001 --> 0.0010 0.002 --> 0.0020 0.003 --> 0.0030 * String of 9s. 8.41E21 --> 8.409999999999999E21 2.0E23 --> 1.9999999999999998E23 8.962E21 --> 8.961999999999999E21 * String of 0s. 7.3879E20 --> 7.387900000000001E20 3.1E22 --> 3.1000000000000002E22 5.63E21 --> 5.630000000000001E21 * 18 digits, even though 17 digits are *always* enough (see Matula's papers cited in the references) 2.82879384806159E17 --> 2.82879384806159008E17 1.387364135037754E18 --> 1.38736413503775411E18 1.45800632428665E17 --> 1.45800632428664992E17 * 5 digits too much. 1.790086667993E18 --> 1.79008666799299994E18 2.273317134858E18 --> 2.27331713485799987E18 7.68905065813E17 --> 7.6890506581299994E17 * Not the closest to the intermediate double, left number is closer. 1.9400994884341945E25 --> 1.9400994884341944E25 3.6131332396758635E25 --> 3.6131332396758634E25 2.5138990223946153E25 --> 2.5138990223946152E25 * Among the powers of 2, there are more than 17% that are output as unnecessarily long numbers, the worst case being 2**959: 4.8726570057E288 --> 4.8726570056999995E288 which is 6 digits longer than needed. It is quite unfortunate that some of the cases presented above can be produced in tons. All of them have not been found by clever analysis but by really simple programs. On the good news, decimal outputs that couldn't recover the original doubles have not been detected. Advocating for a better specification ------------------------------------- What is wrong with the "9.999999999999999E22" output? From the perspective of a human user that types "1e23" and gets "9.999999999999999E22" as feedback, this can be annoying, to say the least. The system responds with a complicated number to a simple input. In the late sixties, Matula published some results about floating-point conversions (see references). One of his results states that there exist conversions from decimal to binary and back to decimal which always recover the very same decimal, provided it has no more than 15 digits (and provided that the intermediate double is normal). Among many other things, this means that 1e23 shall be output as "1.0E23". (Although Matula limited the discussion to truncation and one form of rounding conversion, the results can be extended to more general roundings, including IEEE 754. See http://homepage.sunrise.ch/mysunrise/r.giulietti/Matula.pdf for a short discussion.) By specifying that .Double.toString(double) shall produce the shortest decimal that can recover the original double, Matula's result is implicitly guaranteed. In other words, if .Double.toString(double) is required to produce the shortest decimal, then the decimal -> binary -> decimal is the identity (as far as the numerical value is concerned), provided the original decimal has no more than 15 digits and provided that the intermediate double is normal. In practical terms, this means that the vast majority of user inputs are guaranteed to be output with the same numerical value. All physical constants known to me, for example, can be input to the system and are guaranteed to come out with the same numerical value. This regularity and the "no surprise" side effects for the user, both based on Matula's sound results, are the driving reasons to modify the specification and to adopt the proposed one. Another advantage of the proposed specification is that it clearly separates the definition of the best decimal number from its representation as .String. Moreover, the specified decimal number is unique: there are sufficient rules to ensure that the conversion is unambiguous. This contrasts with the current specification which is ambiguous in the case that there is more than one "shortest" decimal. Such underspecifications fail to ensure identity of the decimal -> binary -> decimal conversion, where applicable. Moreover, they could lead to different results on different implementations. The proposed specification -------------------------- Because this is longer than a usual contribution, the proposed specification is located at http://homepage.sunrise.ch/mysunrise/r.giulietti/Double.html in the form of a Javadoc. The accompanying implementation ------------------------------- To solve the bugs detected in the JDK, I wrote a new implementation of .Double.toString(double) from scratch. Because it mirrors the proposed specification instead of the current one, it only makes sense to submit the implementation to Mustang if the spirit of the specification, if not its exact wording, is accepted as well. The implementation has the following features: * It solves all the problems above. * It is pure Java. It uses only JDK's public APIs in java.lang and java.math and some self written supporting classes. * It is threadsafe as long as the used API's are threadsafe. * It has been extensively tested. All outputs up to 9 digits have been found to be correct according to the proposed specification. Many days of computing have been devoted to test other billions of randomly generated doubles for correctness of the outputs. All boundary cases have been tested (powers of 10, powers of 2, Paxson's test suite, implementation dependent boundary cases). * And last but not least, it performs twice as fast on the average, and for long outputs is up to 4 times faster than the JDK. References ---------- D. W. Matula, "The Base Conversion Theorem", Proc. AMS, 19 (1968) p. 716-723 D. W. Matula, "In-and-Out Conversions", Communications of the ACM, 11 (1968) p. 47-50
|