JDK-8283893 : Compress and expand bits
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.lang
  • Priority: P4
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 19
  • Submitted: 2022-03-29
  • Updated: 2022-04-14
  • Resolved: 2022-04-14
Related Reports
CSR :  
Description
Summary
-------

Add methods to `Integer` and `Long` to compress bits and expand bits, see Hacker's Delight (2nd edition), section 7.4.

Problem
-------

Compressing or expanding bits of an `int` or `long` value can be composed to enable general permutations, and the "sheep and goats" operation (SAG) see Hacker's Delight (2nd edition), section 7.7. SAG can be used to perform a stable binary radix sort.

The compress and expand functionality maps efficiently to hardware instructions, such as `PEXT` and `PDEP` on x86 hardware. Thus the implementations can be very efficient on supporting hardware.

This [paper](https://arxiv.org/pdf/1706.00990.pdf) investigates the beneficial performance impact of the `PDEP` instruction, and by extension the `expand` method, when applied to the implementation of a bit-vector `select` operation in succinct data structures (for example `select(r)` returns the position of the `r`th 1).

Such fundamental and useful operations, sedimented into modern hardware, are excellent candidates to be supported by the Java platform.

Solution
--------

Add methods to compress bits and expand bits, given a bit mask.

Specification
-------------

On `Integer`:

    /**
     * Returns the value obtained by compressing the bits of the
     * specified {@code int} value, {@code i}, in accordance with
     * the specified bit mask.
     * <p>
     * For each one-bit value {@code mb} of the mask, from least
     * significant to most significant, the bit value of {@code i} at
     * the same bit location as {@code mb} is assigned to the compressed
     * value contiguously starting from the least significant bit location.
     * All the upper remaining bits of the compressed value are set
     * to zero.
     *
     * @apiNote
     * Consider the simple case of compressing the digits of a hexadecimal
     * value:
     * {@snippet lang="java" :
     * // Compressing drink to food
     * compress(0xCAFEBABE, 0xFF00FFF0) == 0xCABAB
     * }
     * Starting from the least significant hexadecimal digit at position 0
     * from the right, the mask {@code 0xFF00FFF0} selects hexadecimal digits
     * at positions 1, 2, 3, 6 and 7 of {@code 0xCAFEBABE}. The selected digits
     * occur in the resulting compressed value contiguously from digit position
     * 0 in the same order.
     * <p>
     * The following identities all return {@code true} and are helpful to
     * understand the behaviour of {@code compress}:
     * {@snippet lang="java" :
     * // Returns 1 if the bit at position n is one
     * compress(x, 1 << n) == (x >> n & 1)
     *
     * // Logical shift right
     * compress(x, -1 << n) == x >>> n
     *
     * // Any bits not covered by the mask are ignored
     * compress(x, m) == compress(x & m, m)
     *
     * // Compressing a value by itself
     * compress(m, m) == (m == -1 || m == 0) ? m : (1 << bitCount(m)) - 1
     *
     * // Expanding then compressing with the same mask
     * compress(expand(x, m), m) == x & compress(m, m)
     * }
     * <p>
     * The Sheep And Goats (SAG) operation (see Hacker's Delight, section 7.7)
     * can be implemented as follows:
     * {@snippet lang="java" :
     * int compressLeft(int i, int mask) {
     *     // This implementation follows the description in Hacker's Delight which
     *     // is informative. A more optimal implementation is:
     *     //   Integer.compress(i, mask) << -Integer.bitCount(mask)
     *     return Integer.reverse(
     *         Integer.compress(Integer.reverse(i), Integer.reverse(mask)));
     * }
     *
     * int sag(int i, int mask) {
     *     return compressLeft(i, mask) | Integer.compress(i, ~mask);
     * }
     *
     * // Separate the sheep from the goats
     * sag(0xCAFEBABE, 0xFF00FFF0) == 0xCABABFEE
     * }
     *
     * @param i the value whose bits are to be compressed
     * @param mask the bit mask
     * @return the compressed value
     * @see #expand
     * @since 19
     */
    public static int compress(int i, int mask)

    /**
     * Returns the value obtained by expanding the bits of the
     * specified {@code int} value, {@code i}, in accordance with
     * the specified bit mask.
     * <p>
     * For each one-bit value {@code mb} of the mask, from least
     * significant to most significant, the next contiguous bit value
     * of {@code i} starting at the least significant bit is assigned
     * to the expanded value at the same bit location as {@code mb}.
     * All other remaining bits of the expanded value are set to zero.
     *
     * @apiNote
     * Consider the simple case of expanding the digits of a hexadecimal
     * value:
     * {@snippet lang="java" :
     * expand(0x0000CABAB, 0xFF00FFF0) == 0xCA00BAB0
     * }
     * Starting from the least significant hexadecimal digit at position 0
     * from the right, the mask {@code 0xFF00FFF0} selects the first five
     * hexadecimal digits of {@code 0x0000CABAB}. The selected digits occur
     * in the resulting expanded value in order at positions 1, 2, 3, 6, and 7.
     * <p>
     * The following identities all return {@code true} and are helpful to
     * understand the behaviour of {@code expand}:
     * {@snippet lang="java" :
     * // Logically shift right the bit at position 0
     * expand(x, 1 << n) == (x & 1) << n
     *
     * // Logically shift right
     * expand(x, -1 << n) == x << n
     *
     * // Expanding all bits returns the mask
     * expand(-1, m) == m
     *
     * // Any bits not covered by the mask are ignored
     * expand(x, m) == expand(x, m) & m
     *
     * // Compressing then expanding with the same mask
     * expand(compress(x, m), m) == x & m
     * }
     * <p>
     * The select operation for determining the position of the one-bit with
     * index {@code n} in a {@code int} value can be implemented as follows:
     * {@snippet lang="java" :
     * int select(int i, int n) {
     *     // the one-bit in i (the mask) with index n
     *     int nthBit = Integer.expand(1 << n, i);
     *     // the bit position of the one-bit with index n
     *     return Integer.numberOfTrailingZeros(nthBit);
     * }
     *
     * // The one-bit with index 0 is at bit position 1
     * select(0b10101010_10101010, 0) == 1
     * // The one-bit with index 3 is at bit position 7
     * select(0b10101010_10101010, 3) == 7
     * }
     *
     * @param i the value whose bits are to be expanded
     * @param mask the bit mask
     * @return the expanded value
     * @see #compress
     * @since 19
     */
    public static int expand(int i, int mask)

On `Long`:

    /**
     * Returns the value obtained by compressing the bits of the
     * specified {@code long} value, {@code i}, in accordance with
     * the specified bit mask.
     * <p>
     * For each one-bit value {@code mb} of the mask, from least
     * significant to most significant, the bit value of {@code i} at
     * the same bit location as {@code mb} is assigned to the compressed
     * value contiguously starting from the least significant bit location.
     * All the upper remaining bits of the compressed value are set
     * to zero.
     *
     * @apiNote
     * Consider the simple case of compressing the digits of a hexadecimal
     * value:
     * {@snippet lang="java" :
     * // Compressing drink to food
     * compress(0xCAFEBABE, 0xFF00FFF0) == 0xCABAB
     * }
     * Starting from the least significant hexadecimal digit at position 0
     * from the right, the mask {@code 0xFF00FFF0} selects hexadecimal digits
     * at positions 1, 2, 3, 6 and 7 of {@code 0xCAFEBABE}. The selected digits
     * occur in the resulting compressed value contiguously from digit position
     * 0 in the same order.
     * <p>
     * The following identities all return {@code true} and are helpful to
     * understand the behaviour of {@code compress}:
     * {@snippet lang="java" :
     * // Returns 1 if the bit at position n is one
     * compress(x, 1 << n) == (x >> n & 1)
     *
     * // Logical shift right
     * compress(x, -1 << n) == x >>> n
     *
     * // Any bits not covered by the mask are ignored
     * compress(x, m) == compress(x & m, m)
     *
     * // Compressing a value by itself
     * compress(m, m) == (m == -1 || m == 0) ? m : (1 << bitCount(m)) - 1
     *
     * // Expanding then compressing with the same mask
     * compress(expand(x, m), m) == x & compress(m, m)
     * }
     * <p>
     * The Sheep And Goats (SAG) operation (see Hacker's Delight, section 7.7)
     * can be implemented as follows:
     * {@snippet lang="java" :
     * long compressLeft(long i, long mask) {
     *     // This implementation follows the description in Hacker's Delight which
     *     // is informative. A more optimal implementation is:
     *     //   Long.compress(i, mask) << -Long.bitCount(mask)
     *     return Long.reverse(
     *         Long.compress(Long.reverse(i), Long.reverse(mask)));
     * }
     *
     * long sag(long i, long mask) {
     *     return compressLeft(i, mask) | Long.compress(i, ~mask);
     * }
     *
     * // Separate the sheep from the goats
     * sag(0xCAFEBABE, 0xFF00FFF0) == 0xCABABFEE
     * }
     *
     * @param i the value whose bits are to be compressed
     * @param mask the bit mask
     * @return the compressed value
     * @see #expand
     * @since 19
     */
    public static long compress(long i, long mask)

    /**
     * Returns the value obtained by expanding the bits of the
     * specified {@code long} value, {@code i}, in accordance with
     * the specified bit mask.
     * <p>
     * For each one-bit value {@code mb} of the mask, from least
     * significant to most significant, the next contiguous bit value
     * of {@code i} starting at the least significant bit is assigned
     * to the expanded value at the same bit location as {@code mb}.
     * All other remaining bits of the expanded value are set to zero.
     *
     * @apiNote
     * Consider the simple case of expanding the digits of a hexadecimal
     * value:
     * {@snippet lang="java" :
     * expand(0x0000CABAB, 0xFF00FFF0) == 0xCA00BAB0
     * }
     * Starting from the least significant hexadecimal digit at position 0
     * from the right, the mask {@code 0xFF00FFF0} selects the first five
     * hexadecimal digits of {@code 0x0000CABAB}. The selected digits occur
     * in the resulting expanded value in order at positions 1, 2, 3, 6, and 7.
     * <p>
     * The following identities all return {@code true} and are helpful to
     * understand the behaviour of {@code expand}:
     * {@snippet lang="java" :
     * // Logically shift right the bit at position 0
     * expand(x, 1 << n) == (x & 1) << n
     *
     * // Logically shift right
     * expand(x, -1 << n) == x << n
     *
     * // Expanding all bits returns the mask
     * expand(-1, m) == m
     *
     * // Any bits not covered by the mask are ignored
     * expand(x, m) == expand(x, m) & m
     *
     * // Compressing then expanding with the same mask
     * expand(compress(x, m), m) == x & m
     * }
     * <p>
     * The select operation for determining the position of the one-bit with
     * index {@code n} in a {@code long} value can be implemented as follows:
     * {@snippet lang="java" :
     * long select(long i, long n) {
     *     // the one-bit in i (the mask) with index n
     *     long nthBit = Long.expand(1 << n, i);
     *     // the bit position of the one-bit with index n
     *     return Long.numberOfTrailingZeros(nthBit);
     * }
     *
     * // The one-bit with index 0 is at bit position 1
     * select(0b10101010_10101010, 0) == 1
     * // The one-bit with index 3 is at bit position 7
     * select(0b10101010_10101010, 3) == 7
     * }
     *
     * @param i the value whose bits are to be expanded
     * @param mask the bit mask
     * @return the expanded value
     * @see #compress
     * @since 19
     */
    public static long expand(long i, long mask)

Comments
Moving to Approved.
14-04-2022