Bug ID: JDK-8200527 Inflater/Deflater methods to inflate/deflate on byte buffers

Type: CSR
Component: core-libs
Sub-Component: java.util.jar

Priority: P4
Status: Closed
Resolution: Approved
Fix Versions: 11

Submitted: 2018-03-30
Updated: 2020-09-03
Resolved: 2018-04-09

Summary
-------

This enhancement is to  introduces a set of new methods to j.u.zip.Deflater/Inflater class for deflating and inflating on java.nio.ByteBuffer intput/output. 

Problem
-------

With more and more IO operations are done with java.nio APIs, it is convenient and efficient to have j.u.zip.Deflater and j.u.zip.Inflater compression APIs work with the java.nio.ByteBuffer (direct and heap) interface directly to compress/decompress data without having to copy the bytes back and forth through byte[] first.

Solution
--------

Introduce following new methods in Deflater and Inflater class to support deflate and inflate operation on ByteBuffer input/output .

Deflater.java:
```
public void setInput(ByteBuffer input);
public void setDictionary(ByteBuffer dictionary);
public int deflate(ByteBuffer output);
public int deflate(ByteBuffer output, int flush);
```

Inflater.java:
```
public void setInput(ByteBuffer input);
public void setDictionary(ByteBuffer dictionary);
public int inflate(ByteBuffer output) throws DataFormatException;
```

Specification
-------------

```
Deflater.java

/**
 * This class provides support for general purpose compression using the
 * popular ZLIB compression library. The ZLIB compression library was
 * initially developed as part of the PNG graphics standard and is not
 * protected by patents. It is fully described in the specifications at
 * the <a href="package-summary.html#package.description">java.util.zip
 * package description</a>.
 * <p>
 * This class deflates sequences of bytes into ZLIB compressed data format.
 * The input byte sequence is provided in either byte array or byte buffer,
 * via one of the {@code setInput()} methods. The output byte sequence is
 * written to the output byte array or byte buffer passed to the
 * {@code deflate()} methods.
 * <p>
 * The following code fragment demonstrates a trivial compression
 * and decompression of a string using {@code Deflater} and
 * {@code Inflater}.

...

*/

    /**
     * Sets input data for compression.
     * <p>
     * One of the {@code setInput()} methods should be called whenever
     * {@code needsInput()} returns true indicating that more input data
     * is required.
     * <p>
     * @param input the input data bytes
     * @param off the start offset of the data
     * @param len the length of the data
     * @see Deflater#needsInput
     */
    public void setInput(byte[] input, int off, int len)

    /**
     * Sets input data for compression.
     * <p>
     * One of the {@code setInput()} methods should be called whenever
     * {@code needsInput()} returns true indicating that more input data
     * is required.
     * <p>
     * @param input the input data bytes
     * @see Deflater#needsInput
     */
    public void setInput(byte[] input)

    /**
     * Sets input data for compression.
     * <p>
     * One of the {@code setInput()} methods should be called whenever
     * {@code needsInput()} returns true indicating that more input data
     * is required.
    * <p>
     * The given buffer's position will be advanced as deflate
     * operations are performed, up to the buffer's limit.
     * The input buffer may be modified (refilled) between deflate
     * operations; doing so is equivalent to creating a new buffer
     * and setting it with this method.
     * <p>
     * Modifying the input buffer's contents, position, or limit
     * concurrently with an deflate operation will result in
     * undefined behavior, which may include incorrect operation
     * results or operation failure.
     *
     * @param input the input data bytes
     * @see Deflater#needsInput
     * @since 11
     */
    public void setInput(ByteBuffer input)

    /**
     * Sets preset dictionary for compression. A preset dictionary is used
     * when the history buffer can be predetermined. When the data is later
     * uncompressed with Inflater.inflate(), Inflater.getAdler() can be called
     * in order to get the Adler-32 value of the dictionary required for
     * decompression.
     * <p>
     * The bytes in given byte buffer will be fully consumed by this method.  On
     * return, its position will equal its limit.
     *
     * @param dictionary the dictionary data bytes
     * @see Inflater#inflate
     * @see Inflater#getAdler
     * @since 11
     */
    public void setDictionary(ByteBuffer dictionary)

    /**
     * Compresses the input data and fills specified buffer with compressed
     * data. Returns actual number of bytes of compressed data. A return value
     * of 0 indicates that {@link #needsInput() needsInput} should be called
     * in order to determine if more input data is required.
     *
     * <p>This method uses {@link #NO_FLUSH} as its compression flush mode.
     * An invocation of this method of the form {@code deflater.deflate(output)}
     * yields the same result as the invocation of
     * {@code deflater.deflate(output, Deflater.NO_FLUSH)}.
     *
     * @param output the buffer for the compressed data
     * @return the actual number of bytes of compressed data written to the
     *         output buffer
     * @since 11
     */
    public int deflate(ByteBuffer output) 

    /**
     * Compresses the input data and fills the specified buffer with compressed
     * data. Returns actual number of bytes of data compressed.
     *
     * <p>Compression flush mode is one of the following three modes:
     *
     * <ul>
     * <li>{@link #NO_FLUSH}: allows the deflater to decide how much data
     * to accumulate, before producing output, in order to achieve the best
     * compression (should be used in normal use scenario). A return value
     * of 0 in this flush mode indicates that {@link #needsInput()} should
     * be called in order to determine if more input data is required.
     *
     * <li>{@link #SYNC_FLUSH}: all pending output in the deflater is flushed,
     * to the specified output buffer, so that an inflater that works on
     * compressed data can get all input data available so far (In particular
     * the {@link #needsInput()} returns {@code true} after this invocation
     * if enough output space is provided). Flushing with {@link #SYNC_FLUSH}
     * may degrade compression for some compression algorithms and so it
     * should be used only when necessary.
     *
     * <li>{@link #FULL_FLUSH}: all pending output is flushed out as with
     * {@link #SYNC_FLUSH}. The compression state is reset so that the inflater
     * that works on the compressed output data can restart from this point
     * if previous compressed data has been damaged or if random access is
     * desired. Using {@link #FULL_FLUSH} too often can seriously degrade
     * compression.
     * </ul>
     *
     * <p>In the case of {@link #FULL_FLUSH} or {@link #SYNC_FLUSH}, if
     * the return value is equal to the {@linkplain ByteBuffer#remaining() remaining space}
     * of the buffer, this method should be invoked again with the same
     * {@code flush} parameter and more output space. Make sure that
     * the buffer has at least 6 bytes of remaining space to avoid the
     * flush marker (5 bytes) being repeatedly output to the output buffer
     * every time this method is invoked.
     *
     * <p>On success, the position of the given {@code output} byte buffer will be
     * advanced by as many bytes as were produced by the operation, which is equal
     * to the number returned by this method.
     *
     * <p>If the {@link #setInput(ByteBuffer)} method was called to provide a buffer
     * for input, the input buffer's position will be advanced by the number of bytes
     * consumed by this operation.
     *
     * @param output the buffer for the compressed data
     * @param flush the compression flush mode
     * @return the actual number of bytes of compressed data written to
     *         the output buffer
     *
     * @throws IllegalArgumentException if the flush mode is invalid
     * @since 11
     */
    public int deflate(ByteBuffer output, int flush)

Inflater.java

/**
 * This class provides support for general purpose decompression using the
 * popular ZLIB compression library. The ZLIB compression library was
 * initially developed as part of the PNG graphics standard and is not
 * protected by patents. It is fully described in the specifications at
 * the <a href="package-summary.html#package.description">java.util.zip
 * package description</a>.
 * <p>
 * This class inflates sequences of ZLIB compressed bytes. The input byte
 * sequence is provided in either byte array or byte buffer, via one of the
 * {@code setInput()} methods. The output byte sequence is written to the
 * output byte array or byte buffer passed to the {@code deflate()} methods.
 * <p>
 * The following code fragment demonstrates a trivial compression
 * and decompression of a string using {@code Deflater} and
 * {@code Inflater}.
 *
 
...

*/

    /**
     * Sets input data for decompression.
     * <p>
     * One of the {@code setInput()} methods should be called whenever
     * {@code needsInput()} returns true indicating that more input data
     * is required.
     *
     * @param input the input data bytes
     * @param off the start offset of the input data
     * @param len the length of the input data
     * @see Inflater#needsInput
     */
    public void setInput(byte[] input, int off, int len)

    /**
     * Sets input data for decompression.
     * <p>
     * One of the {@code setInput()} methods should be called whenever
     * {@code needsInput()} returns true indicating that more input data
     * is required.
     *
     * @param input the input data bytes
     * @see Inflater#needsInput
     */
    public void setInput(byte[] input) 

    /**
     * Sets input data for decompression.
     * <p>
     * One of the {@code setInput()} methods should be called whenever
     * {@code needsInput()} returns true indicating that more input data
     * is required.
     * <p>
     * The given buffer's position will be advanced as inflate
     * operations are performed, up to the buffer's limit.
     * The input buffer may be modified (refilled) between inflate
     * operations; doing so is equivalent to creating a new buffer
     * and setting it with this method.
     * <p>
     * Modifying the input buffer's contents, position, or limit
     * concurrently with an inflate operation will result in
     * undefined behavior, which may include incorrect operation
     * results or operation failure.
     *
     * @param input the input data bytes
     * @see Inflater#needsInput
     * @since 11
     */
    public void setInput(ByteBuffer input) 

    /**
     * Sets the preset dictionary to the bytes in the given buffer. Should be
     * called when inflate() returns 0 and needsDictionary() returns true
     * indicating that a preset dictionary is required. The method getAdler()
     * can be used to get the Adler-32 value of the dictionary needed.
     * <p>
     * The bytes in given byte buffer will be fully consumed by this method.  On
     * return, its position will equal its limit.
     *
     * @param dictionary the dictionary data bytes
     * @see Inflater#needsDictionary
     * @see Inflater#getAdler
     * @since 11
     */
    public void setDictionary(ByteBuffer dictionary)

    /**
     * Uncompresses bytes into specified buffer. Returns actual number
     * of bytes uncompressed. A return value of 0 indicates that
     * needsInput() or needsDictionary() should be called in order to
     * determine if more input data or a preset dictionary is required.
     * In the latter case, getAdler() can be used to get the Adler-32
     * value of the dictionary required.
     * <p>
     * On success, the position of the given {@code output} byte buffer will be
     * advanced by as many bytes as were produced by the operation, which is equal
     * to the number returned by this method.  Note that the position of the
     * {@code output} buffer will be advanced even in the event that a
     * {@link DataFormatException} is thrown.
     * <p>
     * The {@linkplain #getRemaining() remaining byte count} will be reduced by
     * the number of consumed input bytes.  If the {@link #setInput(ByteBuffer)}
     * method was called to provide a buffer for input, the input buffer's position
     * will be advanced the number of consumed bytes.
     * <p>
     * These byte totals, as well as
     * the {@linkplain #getBytesRead() total bytes read}
     * and the {@linkplain #getBytesWritten() total bytes written}
     * values, will be updated even in the event that a {@link DataFormatException}
     * is thrown to reflect the amount of data consumed and produced before the
     * exception occurred.
     *
     * @param output the buffer for the uncompressed data
     * @return the actual number of uncompressed bytes
     * @throws DataFormatException if the compressed data format is invalid
     * @throws ReadOnlyBufferException if the given output buffer is read-only
     * @see Inflater#needsInput
     * @see Inflater#needsDictionary
     * @since 11
     */
    public int inflate(ByteBuffer output) throws DataFormatException


```

Moving to Approved.
09-04-2018
updated with suggested wordings (1) added "This class de/infaltes ..." in class api (2) added/updated "One of the setInput ..." for all setInput() methods.
09-04-2018
Administratively marking this request as pended until the docs are modified as discussed; I'll vote to approve the API once it has been updated and re-finalized.
06-04-2018
Hi Sherman, That sounds fine. Note that the @see references like "@see Deflater#needsInput" I believe javadoc will just pick one of the overloads. If you want a @see link to a particular method, you should use a more specific reference to the method. HTH
05-04-2018
Is the following wording good enough? or better to list all three methods with {@link ...}? ``` /** * Sets input data for de/compression. * * One of the {@code setInput()} methods should be called whenever * {@code needsInput() returns true indicating that more input data is * required. * ... */ ```
05-04-2018
I'd prefer to see a brief explanation in each of Inflater/Deflater like: "This class [inflates/deflates] sequences of bytes. Those bytes can either be represented as a byte array or a byte buffer." From reading the API, it looks like the input can be provided in multiple chunks in separate arrays or byte buffers. Is that correct? If so, wording like "This [method setInput(ByteBuffer input)] should be called whenever needsInput() returns true indicating that more input data is required." may need to be rephrased to less strongly indicated this ByteBuffer overload should be called, as opposed to the byte[] overload.
05-04-2018
The setDictionary(ByteBuffer) method is a bit of a nice-to-have, it was only included for consistency. In any case, there is no tie between how the input and dictionary are provided. The input may be in a ByteBuffer, the dictionary bytes (if any) as a byte array.
05-04-2018
Is it nonsensical to have the dictionary be provided by a byte[] but the data to be compressed by provided as a ByteBuffer? I assume there is no implicit requirement for the input and output to match in terms of being a byte[] or a ByteBuffer.
05-04-2018

CSR :	JDK-6341887 - Inflater.setInput(), Inflater.inflate() can't handle ByteBuffer
Relates :	JDK-8252739 - Deflater.setDictionary(byte[], int off, int len) ignores the starting offset for the dictionary