JDK-8321271 : Add OutputStream.write(ByteBuffer)
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.io
  • Priority: P4
  • Status: New
  • Resolution: Unresolved
  • Submitted: 2023-12-04
  • Updated: 2024-09-09
Related Reports
Relates :  
Relates :  
Description
Adding OutputStream.write(ByteBuffer) could improve the following cases:

a) OpenJDK's implementations of InputStream.transferTo(OutputStream) have to copy the buffers, as passing byte arrays to untrusted OutputStream implementations can create security issues: https://github.com/openjdk/jdk/pull/16879 , https://github.com/openjdk/jdk/pull/16893

b) Channels.newChannel(outputStream) has to copy the data, however, it could pass the incoming ByteBuffer without copies: https://github.com/openjdk/jdk/blob/d23f4f12adf1ea26b8c340efe2c3854e50b68301/src/java.base/share/classes/java/nio/channels/Channels.java#L390

c) Passing naked byte[] to untrusted OutputStream implementations is not secure, so moving to ByteBuffer API + read-only byte buffers should improve the security.

----

ByteBuffer has .asReadOnlyBuffer(), and it should be reasonably safe to pass read-only byte buffers without copying them, so if OutputStream.write(ByteBuffer) was there, then implementation of ByteArrayInputStream.transferTo could be improved as out.write(ByteBuffer.wrap(buf, ...));

It would avoid copying the data, and it might enable more efficient implementation when sending the data to a file or network.

The default implementation of OutputStream.write(ByteBuffer buffer) could be something like if (buffer.hasArray()) { write(buffer.array(), ...); } else { write(copy the contents of the buffer to a temp array); }

---

A broader idea might be "OutputStream extends WritableByteChannel", however, Channel has isOpen() method, and I do not see what a default implementation in OutputStream could look like.

Alan Bateman (see https://github.com/openjdk/jdk/pull/16879#discussion_r1413859006): "Interesting but I don't think that is workable as WBC specifies that writes executing serially and also specifies the exception when attempting to write to a closed channel, both of which are not compatible with OutputStream."
Comments
[~alanb] I agree with that, but I think what Vladimir tries is not to rush into, but to work with the core-libs to towards a final solution. So would you agree that it makes sense that Vladimir turns this issue into a PR? We could continue our discussion *there*, so it is directly attached to the actual source code he proposes. [~vsitnikov] Discussions in JIIRA are rather painful as we cannot separate the discussion threads, so I ask the questions again as separate threads in the Github PR once you opened it.
17-12-2023

Adding ByteBuffer methods to InputStream, OutputStream and other java.io classes would be a significant change. For cases where the backing memory may be off-heap or on-heap then MemorySegment may be better for newer APIs. So definitely not something to rush into without taking a wider view.
16-12-2023

Markus KARG raised several concerns at https://github.com/openjdk/jdk/pull/16879#discussion_r1428830013, and I will respond here to segregate the comments. > Markus: I would like to know the performance of custom streams Could you please suggest which custom stream you want to add to the benchmark? I thought GZIPOutputStream, DataOutputStream could qualify for "custom implementations". > Markus: We also need to discuss whether we like the design choice that de facto the public API (not just the implementation) of the legacy IO domain (OutputStream) Right you are. However, there's `Reader implements Readable` (since 1.5) which links `Reader` and `CharBuffer`. > worth the effort to change all implementations Many interfaces already support `ByteBuffers`. As you might see in my changes, `CRC`, `FileOutputStream`, `DeflaterOutputStream` already support BB. > if we could turn parts of it into some generic `byte[] readOnly(byte[])` utility. 1) We need `readOnly(byte[], offset, length)` (a read-only slice of an array) as we probably do not want malicious implementations to peek into the region outside of what we explicitly allow. 2) We need a way to obliterate the slice after we finish writing it, so if malicious code keeps the reference to the slice, then it should not be able to read the data sometime later. There's Frozen Arrays JEP (see https://openjdk.org/jeps/8261007), however, it does not seem to address obliteration, so frozen arrays won't alone won't be enough for .transferTo. I think bytebuffers allow that without much hassle while slice support in JVM would be challenging. > Is it really so huge that it is worth the additional trouble? Frankly speaking, I think it is less troublesome to have `write(ByteBuffer)` method than special-casing implementations. The approach with "isTrusted" allows efficient implementations outside of OpenJDK while "isTrusted" is confined to OpenJDK streams only, and even within OpenJDK, streams like BufferedOutputStream, DataOutputStream can't easily be trusted. > I would like to know how much faster the solution with ByteBuffer is compared to Arrays.copyOfRange(): Is it really so huge that it is worth the additional trouble? Markus, do transferToBufferedBAOS results answer your question? The case there is as follows: BufferedOutputStream can't be trusted, so ByteArrayInputStream have to use copyOf for transferring the data. Is it the benchmark you were asking for? > I would like to know how much slower the solution with ByteBuffer is compared to skipping Arrays.copyOfRange() for trusted cases (as you removed the skipping). I do not understand. In all my benchmarks, the "trusted cases" were activated as they currently are in OpenJDK. The other can't be trusted. I believe I made a fair comparison. transferToBAOS: ByteArrayInputStream -> ByteArrayOutputStream. It is currently trusted, and the performance for both "regular" and "bytebuffer" implementations are virtually the same. transferToBufferedBAOS: ByteArrayInputStream -> BufferedOutputSteram(ByteArrayOutputStream). BufferedOutputSteram can't be trusted as it might pass naked buffer to the malicious stream, so regular transfer to involves copying. transferToFile: ByteArrayInputStream -> FileOutputSteram. It is currently trusted, and the performance is virtually the same for both cases. transferToDataFile: ByteArrayInputStream -> DataOutputStream(FileOutputSteram). DataOutputStream can't be trusted, so the regular implementation is slower than plain ByteArrayInputStream -> FileOutputSteram, while ByteBuffer-based implementation does not have extra overhead. transferToGZIP: ByteArrayInputStream -> GZIPOutputStream(ByteArrayOutputStream). GZIPOutputStream can't be trusted, so regular implementation has to copy arrays.
16-12-2023

The suggested approach yields 5x-2x latency improvements for 10Kib-1024K .transferTo(BufferedOutputStream(ByteArrayOutputStream)) transfers.
14-12-2023

I have created a draft implementation and executed several benchmarks. I am not sure how formatting works in bugs.openjkd.org, so I pasted the results at https://gist.github.com/vlsi/7f4411515a4f2dbb0925fffde92ccb1d The key finding is that the suggested approach improves cases like .transferTo(BufferedOutputStream(..)), .transferTo(GZIPOutputStream), and .transferTo(DataOutputStream(FileOutputStream)), and it can support the same performance for stream implementations outside of JDK. In other words, the suggested approach does not require the target output stream to be trusted for array copy avoidance.
14-12-2023