Bug ID: JDK-8290216 Conversions between bit representations of half precision values and floats

Type: CSR
Component: core-libs
Sub-Component: java.lang

Priority: P4
Status: Closed
Resolution: Approved
Fix Versions: 20

Submitted: 2022-07-13
Updated: 2022-07-21
Resolved: 2022-07-21

Summary
-------

Add methods to convert between the binary16 format of IEEE 754 (stored as a `short`) and `float`.

Problem
-------

The 16-bit binary16 floating-point format is used in some computing contexts and is not natively supported in the Java platform. These two conversion methods provide a minimal level of support and would enable intrinsification to hardware instructions where available.

Solution
--------

Add two methods to `java.lang.Float` to support conversion in both directions between `float` and binary16.

Specification
-------------

    +    /**
    +     * {@return the {@code float} value closest to the numerical value
    +     * of the argument, a floating-point binary16 value encoded in a
    +     * {@code short}} The conversion is exact; all binary16 values can
    +     * be exactly represented in {@code float}.
    +     *
    +     * Special cases:
    +     * <ul>
    +     * <li> If the argument is zero, the result is a zero with the
    +     * same sign as the argument.
    +     * <li> If the argument is infinite, the result is an infinity
    +     * with the same sign as the argument.
    +     * <li> If the argument is a NaN, the result is a NaN.
    +     * </ul>
    +     *
    +     * <h4><a id=binary16Format>IEEE 754 binary16 format</a></h4>
    +     * The IEEE 754 standard defines binary16 as a 16-bit format, along
    +     * with the 32-bit binary32 format (corresponding to the {@code
    +     * float} type) and the 64-bit binary64 format (corresponding to
    +     * the {@code double} type). The binary16 format is similar to the
    +     * other IEEE 754 formats, except smaller, having all the usual
    +     * IEEE 754 values such as NaN, signed infinities, signed zeros,
    +     * and subnormals. The parameters (JLS {@jls 4.2.3}) for the
    +     * binary16 format are N = 11 precision bits, K = 5 exponent bits,
    +     * <i>E</i><sub><i>max</i></sub> = 15, and
    +     * <i>E</i><sub><i>min</i></sub> = -14.
    +     *
    +     * @apiNote
    +     * This method corresponds to the convertFormat operation defined
    +     * in IEEE 754 from the binary16 format to the binary32 format.
    +     * The operation of this method is analogous to a primitive
    +     * widening conversion (JLS {@jls 5.1.2}).
    +     *
    +     * @param floatBinary16 the binary16 value to convert to {@code float}
    +     * @since 20
    +     */
    +    public static float float16ToFloat(short floatBinary16)
    +    ....
    +
    +    /**
    +     * {@return the floating-point binary16 value, encoded in a {@code
    +     * short}, closest in value to the argument}
    +     * The conversion is computed under the {@linkplain
    +     * java.math.RoundingMode#HALF_EVEN round to nearest even rounding
    +     * mode}.
    +     *
    +     * Special cases:
    +     * <ul>
    +     * <li> If the argument is zero, the result is a zero with the
    +     * same sign as the argument.
    +     * <li> If the argument is infinite, the result is an infinity
    +     * with the same sign as the argument.
    +     * <li> If the argument is a NaN, the result is a NaN.
    +     * </ul>
    +     *
    +     * The <a href="#binary16Format">binary16 format</a> is discussed in
    +     * more detail in the {@link #float16ToFloat} method.
    +     *
    +     * @apiNote
    +     * This method corresponds to the convertFormat operation defined
    +     * in IEEE 754 from the binary32 format to the binary16 format.
    +     * The operation of this method is analogous to a primitive
    +     * narrowing conversion (JLS {@jls 5.1.3}).
    +     *
    +     * @param f the {@code float} value to convert to binary16
    +     * @since 20
    +     */
    +    public static short floatToFloat16(float f)

Moving to Approved.
21-07-2022
[~psandoz]; yes, I was thinking a bit more about how a Float16/BinaryFloat16 value class might be implemented. If the value class couldn't directly extend java.lang.Number, I would still expect it to implement the methods on Number, including floatValue().
21-07-2022
Renaming looks good. I think it's OK not to qualify the method names with "Bits", since we are unlikely to overload with a method returning a different representation of a bianry16 value e.g. Float16. Such conversion methods can reside on the Float16 value class.
21-07-2022
[~psandoz], please re-affirm your review.
21-07-2022
Rename methods per feedback.
21-07-2022
I approved the draft, but i do think we need to reconsider using "binary16" in the method name. This makes sense within the boundaries of the IEEE specification but it does not transpose well to a general programming language, likely resulting in confusion. Further, we should align the chosen name with any value class we might add, enabled by Valhalla. Suggested names: "half" or "float16". My preference is for the latter, with a model of name + bits e.g. BFloat16 or Int128 for when there is no obvious candidate e.g. UnsignedShort. We might also be able to compress the method name e.g. {X}BitsToFloat, floatTo{X}Bits, and lean on short being an argument.
18-07-2022
Moving to Provisional.
18-07-2022