JDK-8028041 : Serialized Form description of j.l.String is not consistent with the implementation
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.io:serialization
  • Affected Version: 8
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2013-11-08
  • Updated: 2017-05-17
  • Resolved: 2013-11-08
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8 Other
8 b117Fixed port-stage-ppc-aixFixed
Related Reports
Relates :  
In the description of the serialized form for the j.l.String class we have: 
"Class String is special cased within the Serialization Stream Protocol. A String instance is written initially into an ObjectOutputStream in the following format: 
       TC_STRING (utf String)
The String is written by method DataOutput.writeUTF. A new handle is generated to refer to all future references to the string instance within the stream."

But in fact if utf length of the string is bigger than 0xFFFF, then a String instance is written in the following format:
TC_LONGSTRING (utf String).
Right, agreed.

Section 6.4.1 Rules of the Grammar defines the format of the stream precisely. newString: TC_STRING newHandle (utf) TC_LONGSTRING newHandle (long-utf) The prose is used to understand the grammar but seems to avoid/omit redundant descriptions of all of the details.

I think that it will be very good to have the corresponding CCC because the API spec was changed. By the way, now (according to the HG Updates URL) the spec for Serialized Form for String contains only link to the Java Object Serialization Specification. In the corresponding part we have the next assertions: "The representation of String objects consists of length information followed by the contents of the string encoded in modified UTF-8. The modified UTF-8 encoding is the same as used in the Java Virtual Machine and in the java.io.DataInput and DataOutput interfaces; it differs from standard UTF-8 in the representation of supplementary characters and of the null character. The form of the length information depends on the length of the string in modified UTF-8 encoding. If the modified UTF-8 encoding of the given String is less than 65536 bytes in length, the length is written as 2 bytes representing an unsigned 16-bit integer. Starting with the Java 2 platform, Standard Edition, v1.3, if the length of the string in modified UTF-8 encoding is 65536 bytes or more, the length is written in 8 bytes representing a signed 64-bit integer. The typecode preceding the String in the serialization stream indicates which format was used to write the String." I think that the previous spec with clarification which exactly typecode uses for the serialization of String object could be used in the new version of the specification with this link. Because in the JOSS we have only "The typecode preceding the String in the serialization stream indicates which format was used to write the String". And this is not apparent that the typecode is TC_STRING or TC_LONGSTRING (0x74 and 0x7C corresponding). What do you think about it?

Since the specification behavior did not change, a CCC was not proposed. Is it necessary?

I cannot find CCC regarding this change. To be sure: it has not been processed yet, right?

I've added the 8-critical-watch to this bug as it is a P2/conformance bug just reported by the JCK folks. It is a 12 year issue so I don't think it is really critical for 8.

Yes, it's a javadoc-only issue (http://download.java.net/jdk8/docs/api/serialized-form.html#java.lang.String).

From what I can tell, it appears that when support for TC_LONGSTRING was added (in Java SE 1.3) that the javadoc for serialPersistentFields wasn't updated. So this looks like a docs only issue.