United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-4219771 : java.io.Data{Input,Output}Stream: Improve performance of {read,write}UTF

Details
Type:
Enhancement
Submit Date:
1999-03-12
Status:
Resolved
Updated Date:
2013-11-01
Project Name:
JDK
Resolved Date:
1999-05-18
Component:
core-libs
OS:
windows_nt,generic
Sub-Component:
java.io
CPU:
x86,generic
Priority:
P3
Resolution:
Fixed
Affected Versions:
1.1.7,1.2.0
Fixed Versions:
1.3.0 (kestrel)

Related Reports
Duplicate:
Duplicate:
Duplicate:
Duplicate:

Sub Tasks

Description
This bug report is a combination of several reports that describe (at least)
three problems with the readUTF/writeUTF methods:

Problem 1:

    DataOutputStream.writeUTF runs through its argument String twice using
    String.charAt.  Instead, it could use String.getChars to get the chars into
    a local char[] and then walk that (twice).  This would save two calls to
    String.charAt per char, and would avoid the explicit range checks that
    String.charAt does.  One hopes this change would allow a good compiler to
    elide the subscript range checks on the array access, etc. generating
    generally better code.

    One might want to avoid the extra char[] if the argument String were very
    long, to avoid running out of memory.

    peter.kessler@Eng 1998-11-19

Problem 2:

    DataInputStream.readUTF is using a public String constructor, which forces
    the char array to always be reallocated and copied.  It is frequently the
    case that the char array created by readUTF will be of exactly the right
    length, and could be used as-is, without copying, if only there was a
    privilged way for it to create a String.

    In an RMI server that I have, which involves serialization via RMI and then
    serialization to a file for persistent storage, over 4Kb of heap is
    allocated and then discarded by readUTF per single RMI call to the server.
    It would be very beneficial to eliminate this overhead.

    rws@east

Problem 3:

    DataOutputStream.writeUTF computes single bytes of the resulting UTF and
    writes them to the stream one byte at a time.  That's okay for buffered
    output streams, but for ByteArrayOutputStreams (or unbuffered streams) that
    results in a synchronized method call per byte (at least).

    Similarly for DataInputStream.readUTF reading from a ByteArrayInputStream.

    pbk@eng

-- mr@eng 1999/4/19

                                    

Comments
CONVERTED DATA

BugTraq+ Release Management Values

COMMIT TO FIX:
kestrel-beta

FIXED IN:
kestrel
kestrel-beta

INTEGRATED IN:
kestrel
kestrel-beta


                                     
2004-06-14
SUGGESTED FIX

Problem 2:

    One option is to use a native method to create the String.

    Another option might be to keep a char[] buffer around as a field of
    DataInputStream and reuse it (allocating a new one if the UTF length is
    larger than the current buffer, possibly throwing it away if it becomes too
    large, or possibly holding a soft reference to it).

    [Bob Scheifler 1998-06-10]

    There is a String constructor that takes a StringBuffer that would achieve
    the desired effect of not reallocating the char array a second time.


	public final static String readUTF(DataInput in) throws IOException {
	    int utflen = in.readUnsignedShort();
    Change
	    char str[] = new char[utflen];
    To
	    StringBuffer str = new StringBuffer(utflen);

    All accessed to str[] need to be changed to StringBuffer.setCharAt() Since
    StringBuffer is final, there exists a chance that setCharAt can be inlined
    to avoid the overhead of many fct calls.

    And the final return would be

	    new String(str);

    This constructor converts a StringBuffer into a String efficiently as long
    as no ever modifes str, which they will not be able to.

    [joseph.fialli@East 1998-06-10]

    But StringBuffer is synchronized, an undesirable performance penalty.

    [bob.scheifler@East 1998-06-10]

Problem 3:

    The suggested fix is to assemble the result of writeUTF into a local byte[]
    and then do a single write(byte[], ...).  That copies the bytes an extra 
    time (with System.arraycopy in ByteArrayOutputStream), but it avoids a lot 
    of synchronized calls.

    The parallel fix in DataInputStream.readUTF is, after you figure out how long 
    the incoming UTF is, to do a readFully(byte[],...) to a local byte[] and 
    then extract bytes from that array rather than calling readUnsignedByte() 
    for each byte.

    peter.kessler@Eng 1998-12-03

A general rewrite from an engineer at IBM is enclosed in the comments section.
                                     
1998-12-03
EVALUATION

See code in comments section.  -- mr@eng 1999/4/19
                                     
185-07-04 0



Hardware and Software, Engineered to Work Together