Bug ID: JDK-4219771 java.io.Data{Input,Output}Stream: Improve performance of {read,write}UTF

JDK-4219771 : java.io.Data{Input,Output}Stream: Improve performance of {read,write}UTF

Type: Enhancement
Component: core-libs
Sub-Component: java.io
Affected Version: 1.1.7,1.2.0

Priority: P3
Status: Resolved
Resolution: Fixed
OS: generic,windows_nt
CPU: generic,x86

Submitted: 1999-03-12
Updated: 2013-11-01
Resolved: 1999-05-18

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

Other
1.3.0 kestrelFixed

Related Reports

Duplicate :	JDK-4187127 - java.io.DataInputStream.readUTF is slow
Duplicate :	JDK-4194583 - java.io.DataInputStream.readUTF/DataOutputStream.writeUTF operate on single byte
Duplicate :	JDK-4147390 - java.io.DataInputStream.readUTF should avoid copying char[] data
Duplicate :	JDK-4191365 - java.io.DataOutputStream.writeUTF could be faster

Description

This bug report is a combination of several reports that describe (at least)
three problems with the readUTF/writeUTF methods:

Problem 1:

    DataOutputStream.writeUTF runs through its argument String twice using
    String.charAt.  Instead, it could use String.getChars to get the chars into
    a local char[] and then walk that (twice).  This would save two calls to
    String.charAt per char, and would avoid the explicit range checks that
    String.charAt does.  One hopes this change would allow a good compiler to
    elide the subscript range checks on the array access, etc. generating
    generally better code.

    One might want to avoid the extra char[] if the argument String were very
    long, to avoid running out of memory.

    peter.kessler@Eng 1998-11-19

Problem 2:

    DataInputStream.readUTF is using a public String constructor, which forces
    the char array to always be reallocated and copied.  It is frequently the
    case that the char array created by readUTF will be of exactly the right
    length, and could be used as-is, without copying, if only there was a
    privilged way for it to create a String.

    In an RMI server that I have, which involves serialization via RMI and then
    serialization to a file for persistent storage, over 4Kb of heap is
    allocated and then discarded by readUTF per single RMI call to the server.
    It would be very beneficial to eliminate this overhead.

    rws@east

Problem 3:

    DataOutputStream.writeUTF computes single bytes of the resulting UTF and
    writes them to the stream one byte at a time.  That's okay for buffered
    output streams, but for ByteArrayOutputStreams (or unbuffered streams) that
    results in a synchronized method call per byte (at least).

    Similarly for DataInputStream.readUTF reading from a ByteArrayInputStream.

    pbk@eng

-- mr@eng 1999/4/19

Comments

CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: kestrel-beta FIXED IN: kestrel kestrel-beta INTEGRATED IN: kestrel kestrel-beta

14-06-2004

SUGGESTED FIX Problem 2: One option is to use a native method to create the String. Another option might be to keep a char[] buffer around as a field of DataInputStream and reuse it (allocating a new one if the UTF length is larger than the current buffer, possibly throwing it away if it becomes too large, or possibly holding a soft reference to it). [Bob Scheifler 1998-06-10] There is a String constructor that takes a StringBuffer that would achieve the desired effect of not reallocating the char array a second time. public final static String readUTF(DataInput in) throws IOException { int utflen = in.readUnsignedShort(); Change char str[] = new char[utflen]; To StringBuffer str = new StringBuffer(utflen); All accessed to str[] need to be changed to StringBuffer.setCharAt() Since StringBuffer is final, there exists a chance that setCharAt can be inlined to avoid the overhead of many fct calls. And the final return would be new String(str); This constructor converts a StringBuffer into a String efficiently as long as no ever modifes str, which they will not be able to. [joseph.fialli@East 1998-06-10] But StringBuffer is synchronized, an undesirable performance penalty. [bob.scheifler@East 1998-06-10] Problem 3: The suggested fix is to assemble the result of writeUTF into a local byte[] and then do a single write(byte[], ...). That copies the bytes an extra time (with System.arraycopy in ByteArrayOutputStream), but it avoids a lot of synchronized calls. The parallel fix in DataInputStream.readUTF is, after you figure out how long the incoming UTF is, to do a readFully(byte[],...) to a local byte[] and then extract bytes from that array rather than calling readUnsignedByte() for each byte. peter.kessler@Eng 1998-12-03 A general rewrite from an engineer at IBM is enclosed in the comments section.

03-12-1998

EVALUATION See code in comments section. -- mr@eng 1999/4/19

04-07-0185