JDK-4039553 : Serialization of large strings fail
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.io:serialization
  • Affected Version: 1.1,1.1.2
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • OS: solaris_2.5.1,windows_95
  • CPU: x86,sparc
  • Submitted: 1997-03-18
  • Updated: 1997-08-18
  • Resolved: 1997-08-18
Related Reports
Duplicate :  
Description

Name: sgC58550			Date: 03/17/97


When serializing a hashtable containing large strings (100k etc),
the following exception is generated:

java.io.UTFDataFormatException
        at java.io.DataOutputStream.writeUTF(DataOutputStream.java:310)
        at java.io.ObjectOutputStream.writeUTF(ObjectOutputStream.java:1032)
        at java.io.ObjectOutputStream.outputString(ObjectOutputStream.java:480)
        at java.io.ObjectOutputStream.checkSpecialClasses(ObjectOutputStream.java:301)
        at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:182)
        at java.util.Hashtable.writeObject(Hashtable.java:406)
        at java.io.ObjectOutputStream.outputObject(ObjectOutputStream.java:629)
        at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:225)

The documentation states that writeUTF stores the length of the
string in 2 bytes. This suggests a limit of 32000 characters in
a string.

This is clearly a bug in serializtion, but it could also be
argued that the implementation of writeUTF is rather naive :)

company -  , email - ###@###.###
======================================================================


the following programm shows the problem:
If the String is bigger (or equal) than 64 kB, there
is an exception thrown by writeObject:

> java t >/dev/null
write-exception: java.io.UTFDataFormatException

------- t.java -------

public class t {
   public static void main(String arg[]) {
      String s="", a="test";
      for (int i=0; i<256*64; i++)
         s=s+a;
      try {
         new ObjectOutputStream(System.out).writeObject(s);        
      } catch (IOException e) {
         System.err.println("write-exception: "+e); 
      }
   }
};

company -  , email - ###@###.###
=======================================================================

Comments
EVALUATION Indeed Strings in UTF format are restricted to an encoded length of 65536. Since the encoding may take 1, 2 or 3 bytes it is not predictable how long a string will fit. This is a duplicate of BugId 4025564. It should be fixed when UTF representation can be redefined to handle longer string.
11-06-2004

WORK AROUND Name: sgC58550 Date: 03/17/97 Ways of fixing bug: Either write a new method DataOutputStream.writeLongUTF and use that when serializing large strings or re-write serialization to store large strings in it's own format similar to writeUTF but using 4 bytes for the length field. ======================================================================
11-06-2004

PUBLIC COMMENTS Serialization uses UTF format to store strings. The encoded string may not exceed 65536 bytes. For longer strings the Application will need to write the strings as byte arrays.
10-06-2004