Bug ID: JDK-8264777 Overload optimized FileInputStream::readAllBytes

Type: Enhancement
Component: core-libs
Sub-Component: java.io
Affected Version: 17

Priority: P4
Status: Closed
Resolution: Fixed

Submitted: 2021-04-06
Updated: 2025-06-20
Resolved: 2021-05-17

JDK 17
17 b23Fixed

ADDITIONAL SYSTEM INFORMATION :
Here is a JMH benchmark which gives an implementation example and shows 30% performance gain for 10MB sized files on Windows. For smaller files there is less improvement but i did not see any degradation on any file size from 100Byte to 10MB:

package benchmarks;

import java.io.IOException;
import java.util.Arrays;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.TearDown;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

@BenchmarkMode(org.openjdk.jmh.annotations.Mode.SingleShotTime)
@OutputTimeUnit(java.util.concurrent.TimeUnit.MILLISECONDS)
@State(org.openjdk.jmh.annotations.Scope.Thread)
@Fork(value = 1, jvmArgsAppend = { "-ea" })
@Warmup(batchSize = 1000)
@Measurement(batchSize = 1000)
public class FileRead {
	// XXX put in any directory where the files are located.
	String dirName = "resources/";
	// XXX put in any filenames you like to test.
	@Param({ "100b.txt", "1k.txt", "10k.txt", "100k.txt", "1MB.txt", "10MB.txt" })
	String fileName;

	java.io.File file;
	byte[] result;
	byte[] expected;

	@Setup
	public void setup() throws IOException {
		file = new java.io.File(dirName + fileName).getAbsoluteFile();
		result = null;
		expected = java.nio.file.Files.readAllBytes(file.toPath());
	}

	@TearDown
	public void check() {
		assert Arrays.equals(expected, result) : "Nothing changed?";
	}

	public static final int MAX_ARRAY_LENGTH = Integer.MAX_VALUE - 8;

	@Benchmark
	public void readAllBytesOld() throws IOException {
		try (java.io.InputStream input = new java.io.FileInputStream(file)) {
			result = input.readAllBytes();
		}
	}

	@Benchmark
	public void readAllBytesNew() throws IOException {
		try (java.io.InputStream input = new java.io.FileInputStream(file) {

			@Override
			public byte[] readAllBytes() throws IOException {
				long length = this.getChannel().size();
				// use jdk.internal.util.ArraysSupport.newLength(int, int, int)?
				if (length > MAX_ARRAY_LENGTH)
					throw new OutOfMemoryError("File too large for array: " + length);
				return readNBytes(this, (int) length);
			}

			byte[] readNBytes(java.io.InputStream input, int byteLength) throws IOException {
				if (byteLength == 0)
					return new byte[0];
				byte[] byteBuf = new byte[byteLength]; // exact buffer size
				int byteCount = 0;
				int byteTransferSize = byteBuf.length;
				int bytesRead;
				while ((bytesRead = input.read(byteBuf, byteCount, byteTransferSize)) >= 0) {
					byteCount += bytesRead;
					byteTransferSize = byteBuf.length - byteCount;
					if (byteTransferSize <= 0) {
						break;
					}
				}
				return (byteBuf.length == byteCount) ? byteBuf : Arrays.copyOf(byteBuf, byteCount);
			}
		}) {
			result = input.readAllBytes();
		}
	}

	public static void main(String[] args) throws RunnerException, InterruptedException {
		Options opt = new OptionsBuilder().include(FileRead.class.getSimpleName()).shouldFailOnError(true).build();
		new Runner(opt).run();
	}
}

My results:

Benchmark                 (fileName)  Mode  Cnt     Score   Error  Units
ReadByt3.readAllBytesNew    100b.txt    ss         34,081          ms/op
ReadByt3.readAllBytesNew      1k.txt    ss         35,951          ms/op
ReadByt3.readAllBytesNew     10k.txt    ss         40,996          ms/op
ReadByt3.readAllBytesNew    100k.txt    ss         66,433          ms/op
ReadByt3.readAllBytesNew     1MB.txt    ss        587,246          ms/op
ReadByt3.readAllBytesNew    10MB.txt    ss       5361,234          ms/op

ReadByt3.readAllBytesOld    100b.txt    ss         35,115          ms/op
ReadByt3.readAllBytesOld      1k.txt    ss         35,951          ms/op
ReadByt3.readAllBytesOld     10k.txt    ss         45,528          ms/op
ReadByt3.readAllBytesOld    100k.txt    ss        125,894          ms/op
ReadByt3.readAllBytesOld     1MB.txt    ss        630,972          ms/op
ReadByt3.readAllBytesOld    10MB.txt    ss       7538,637          ms/op


A DESCRIPTION OF THE PROBLEM :
InputStream::readAllBytes currently reads all bytes through a series of buffers (https://bugs.openjdk.java.net/browse/JDK-8193832). For local files - where the filesize is known in advance- this could be optimized by reading all bytes at once from OS. Thus avoiding additional array creations and Array copies.
Reading all bytes at once is for example heavily used on products like the eclipse IDE.

i am pleased to see the suggested performance is reached and the JDKs 17 implementation performs even slightly better then the suggested. :-) However there seems to be room for the same improvement for java.nio's java.nio.file.Files.newInputStream().readAllBytes() which is still as slow as before. I suggest to overload sun.nio.ch.ChannelInputStream::readAllBytes with the same pattern as FileInputStream::readAllBytes.
08-06-2021
Requested the submitter verify the fix by download the latest version of JDK 17 at https://jdk.java.net/17/
04-06-2021
Changeset: da4dfde7 Author: Brian Burkhalter <bpb@openjdk.org> Date: 2021-05-17 19:58:41 +0000 URL: https://git.openjdk.java.net/jdk/commit/da4dfde71a176d2b8401782178e854d4c924eba1
17-05-2021
The proposed change will also fail if the file being read increases in size during the operation. It should however work if the file is truncated.
30-04-2021
Note that the proposed version of readAllBytes() is incorrect if the FileInputStream is not at its beginning as the position of the stream is not taken into account. The correct number of bytes to be read would be getChannel().size() - getChannel().position().
29-04-2021
Moved to JDK to investigate the enhancement.
06-04-2021

Relates :	JDK-8193832 - Performance of InputStream.readAllBytes() could be improved
Relates :	JDK-8268435 - (ch) ChannelInputStream could override readAllBytes
Relates :	JDK-8156715 - TrustStoreManager does not buffer keystore input stream