Bug ID: JDK-4716483 server-side unmarshalling failure can cause connection reset to client

JDK-4716483 : server-side unmarshalling failure can cause connection reset to client

Type: Bug
Component: core-libs
Sub-Component: java.rmi
Affected Version: 1.4.0

Priority: P4
Status: Open
Resolution: Unresolved
OS: solaris_8
CPU: sparc

Submitted: 2002-07-17
Updated: 2013-04-12

Related Reports

Relates :

JDK-4415668 - UnmarshalException can leave conn in invalid state

Description

If an exception occurs while unmarshalling a remote call's arguments, UnicastServerRef.dispatch marshals the exception back to the client and then returns, without consuming any more of the argument data than had been consumed at the point of the exception.  Assuming that the exception was marshalled successfully, then TCPTransport.handleMessages will attempt to read another message from the JRMP connection-- which is a dubious thing to do in this situation (see 4415668), because in all likelihood, the attempt will fail quickly with some protocol error, because it will be reading from the input stream left in the state that it was at time of the unmarshalling exception, somewhere in the middle of a partially unmarshalled argument.  The eventual reaction to this protocol error will cause the socket to be fully closed (for output and input).

If the client was marshalling a very large amount of argument data, such that it is still doing socket writes for the argument data after the server has performed a full close of its socket, then the server side's TCP implementation will send a TCP reset to the client upon recipt of such a write, and if the client attempts another write after receipt of this TCP reset, then the client side's TCP implementation will cause that write operation to fail with a "Connection reset by peer" indication (or equivalent).  This will cause the client-side RMI implementation consider the remote call to have failed with a java.rmi.MarshalException wrapping the IOException for the "Connection reset by peer".

The client-side RMI implementation will never bother to read the original unmarshalling exception that had caused the problem and that the server-side RMI implementation had so nicely sent for the client's benefit-- thus making the original exception, which caused the ultimate failure, difficult to debug.

This difficulty has been the cause of numerous problems reported by users, such as on the RMI-USERS and JINI-USERS lists and the rmi-comments alias.

On Linux, the client-side failure might look like this:

java.rmi.MarshalException: error marshalling arguments; nested exception is:
        java.net.SocketException: Connection reset by peer: socket write error

and on Windows, it might look like this:

java.rmi.MarshalException: error marshalling arguments; nested exception is:
	java.net.SocketException: Software caused connection abort: socket write error

or something similar.
The same problem can occur if the server side throws a NoSuchObjectException upon reading the invocation's object ID, which occurs before even starting to unmarshal the arguments-- then none of the argument data will be consumed until it is bogusly interpreted as the next transport-level message, and the server side will close the connection.  If the argument data is very large, then the same problem described above will occur, with the client seeing an apparent network problem instead of the root cause NoSuchObjectException.

This NoSuchObjectException case may seem more qualitatively problematic than the unmarhsalling failure case, because in certain situations NoSuchObjectException can be more of an "expected" remote invocation failure mode, like with various schemes involving remote objects that can go away and be restarted.  Also, a remote invocation that throws NoSuchObjectException can be considered "safe" to retry without violating at-most-once execution semantics, but MarshalException and UnmarshalException, in general, cannot-- therefore, in the NoSuchObjectException case, this bug causes a safe-to-retry failure to appear like an unsafe-to-retry failure, which is unfortunate.

Comments

WORK AROUND Note that the "sun.rmi.server.exceptionTrace" system property does not currently enable output of server-side stack traces for NoSuchObjectException (that seems like a bug). NoSuchObjectException stack traces can be output using the "java.rmi.server.logCalls" system property or by setting the level of the "sun.rmi.server.call" Logger to Level.FINE (or lower). The latter approach provides the flexibility of not having to output non-exceptional remote call logging.
30-11-2005
EVALUATION Yes, this is a problem that frequently confuses users. The desired solution is not yet clear. ###@###.### 2002-10-08
08-10-2002
SUGGESTED FIX Presumably the client-side RMI implementation should not be burdened with having to check for an exception result from the server during argument marshalling with a frequency high enough to reliably fix this problem? One could imagine having the client-side RMI implementation check, after getting a SocketException from marshalling, to see if there is an exception from the server waiting to be read from the connection. That doesn't work if the exception's representation is very large, though (large stack traces, many chained exceptions, etc.). [In fact, this points to another defect in the JRMP implementation today: both sides could become deadlocked if the server tries to marshal a very large exception while the client is still trying to marshal a large amount of argument data-- both sides will be blocked forever trying to write data that isn't being consumed by the other side. That would be bad and should also be fixed.] So any alternative would seem to mean that the client should be permitted to write the entirety of its argument data without failure. That, in turn, means that the server-side RMI implementation should consume data from the client before fully closing its socket, so the client can finish sending all of its argument data without blocking. But where should this happen, and to what extent? Because of the partially unmarshalled state that the connection is in (and because the JRMP stream protocol provides no framing around calls, which in a primary cause of the problem here), it does not seem possible to consume the rest of the client's argument data for the call with any sophistication (i.e. knowing when to stop). But it also seems perhaps slightly dangerous, from a network robustness perspective, for the server-side RMI implementation to just consume an infinite amount of uninterpreted data? TCPTransport.handleMessages could handle this data consumption where it currently closes the server-side socket upon protocol error, although that would seem to cover more cases than is necessary. If 4415668 were fixed so that UnicastServerRef.dispatch somehow conspires with its callers to indicate cases when TCPTransport.handleMessages should not attempt to read another JRMP message, it would seem those are the same cases in which arbitrary client data should be blindly consumed. This consumption could conceivably occur in UnicastServerRef.dispatch, Transport.serviceCall, or TCPTransport.handleMessages... ###@###.### 2002-07-17
17-07-2002
WORK AROUND Set the system property -Dsun.rmi.server.exceptionTrace=true (or -Djava.rmi.server.logCalls=true) on the server VM to get the server-side exception trace output to System.err. ###@###.### 2002-07-17
17-07-2002