Bug ID: JDK-4954570 Misleading description of java.io.ObjectInputStream.available().

Type: Bug
Component: core-libs
Sub-Component: java.io:serialization
Affected Version: 1.4.2

Priority: P3
Status: Closed
Resolution: Not an Issue
OS: linux
CPU: x86

Submitted: 2003-11-14
Updated: 2003-12-05
Resolved: 2003-12-05


Name: jl125535			Date: 11/14/2003


URL OF PROBLEM DOCUMENTATION :
http://java.sun.com/j2se/1.4.2/docs/api/java/io/ObjectInputStream.html#available()
http://developer.java.sun.com/developer/bugParade/bugs/4032352.html

A DESCRIPTION OF THE PROBLEM :
	The behavior of java.io.ObjectInputStream.available() does
not match the documentation in the API, "Returns the number
of bytes that can be read without blocking."  I have read of
developers having trouble using this method, myself having
lost significant time on this issue (trying first to figure
out what I had done wrong).
	Could someone enhance or correct the API for this method to
have some useful information such as that mentioned in bug
#4032352, instead of the misleading statements currently
there?


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1.  Read the API for java.io.ObjectInputStream.available().
2.  Compare that to the bug #4032352.
(Incident Review ID: 179376) 
======================================================================

EVALUATION The behavior of ObjectInputStream.available() may indeed be confusing when compared with that of other input streams. However, strictly speaking, the javadoc for ObjectInputStream.available() is accurate. Some background on the object serialization protocol is useful in explaining why: In the Java object serialization protocol, data is written by the serialization stream in one of two modes--"normal", and "block data". Data written in normal mode is simply written directly to the underlying stream, without transformation. Normal mode is used for writing data whose layout is is dictated by the serialization protocol: for instance, metadata such as typecodes identifying the type of the next element (string, array, etc.) in the stream; class descriptors, and default serializable field data. Conversely, block data is used for writing "custom" data in an order and layout dictated by application code: for instance, the sequence of primitive and object values written by a class-defined writeObject or writeExternal method. Since this custom data may be of arbitrary length, the serialization stream must somehow indicate where the data terminates (and normal data resumes). Block data mode accomplishes this by bracketing chunks of the data in blocks, where each block includes a header indicating its length. A single span of custom data (e.g., written by a single class-defined writeObject method) may generate multiple blocks of data, if it's long enough. During deserialization, ObjectInputStream reads in data differently depending on whether or not it's formatted as block data. Normal data is read directly from the stream, whereas block data is read in blocks. Data from the current block is buffered inside of ObjectInputStream, and individual data reads fetch data from the buffer (unless the current block is exhausted, at which point the next block is read in). One key point to note is that block data mode ends up *always* being in effect whenever application code is in a position to invoke a primitive or object read method on ObjectInputStream, since such invocations would only come from within a class-defined readObject/readExternal method, or from a "top-level" read on the ObjectInputStream (the ObjectInputStream initially has block data mode enabled, since the top-level objects and primitive data in the stream can themselves be viewed as a sequence of "custom" application-written data). At this point it's worth reviewing the javadoc for ObjectInputStream.available(), which states: Returns the number of bytes that can be read without blocking. Consider the position of ObjectInputStream when application code has invoked its available() method during deserialization. As noted above, it will be in block data mode. It will contain some amount N (perhaps 0) of readable data in its internal buffer for the current data block. The next element (after the data block) in the stream may be another data block, or it could be something else--for example, an object, or a typecode marking the end of custom data. At this point, ObjectInputStream can only guarantee that N bytes are readable without blocking. It cannot peek at the next element/block in the stream, since that would involve issuing a read to the underlying stream that could block. Therefore, available() returns N. Although N is in most cases different from the total number of raw bytes available from the underlying stream, it is in fact the maximum number of bytes that ObjectInputStream can guarantee it can read without potentially blocking on I/O. Therefore, the javadoc is accurate. Another way of looking at this is to consider what would occur if ObjectInputStream.available() were instead to return the total number of raw bytes available from the underlying stream. In most cases, this would be more than the number of bytes of "custom" data available to be read. A read for that amount of data from the ObjectInputStream would return less data than expected, or even EOF. Thus, ObjectInputStream.available() would violate the spec for InputStream.available(), and be completely useless for application code, since the amount of raw data available from the underlying stream has no bearing on how much can be immediately read from the current span of data blocks.

11-06-2004