JDK-8319228 : Improve robustness of String constructors with mutable array inputs
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.lang
  • Priority: P3
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 22
  • Submitted: 2023-11-01
  • Updated: 2023-11-27
  • Resolved: 2023-11-27
Related Reports
CSR :  
Description
Summary
-------
Warn against modification of array and CharSequence arguments to String constructors and methods in StringBuilder and Appendable; the results are not specified if the arguments are changed during the constructor or method.

Problem
-------

The documentation for constructing strings and appending from arrays does not warn against modification of the arrays or CharSequences during the method calls.
The affected java.lang classes are String, StringBuilder, and Appendable.


Solution
--------

Add a warning to the constructors of String, and the methods of StringBuilder and Appendable that modification of arrays or CharSequence arguments can result in unspecified results. 

Specification
-------------

java.lang.String:
 
     /**
      * Allocates a new {@code String} so that it represents the sequence of
      * characters currently contained in the character array argument. The
      * contents of the character array are copied; subsequent modification of
      * the character array does not affect the newly created string.
      *
    + * <p> The contents of the string are unspecified if the character array
    + * is modified during string construction.
    + *
      * @param  value
      *         The initial value of the string
      */
     public String(char[] value) {...}
    
     /**
      * Allocates a new {@code String} that contains characters from a subarray
      * of the character array argument. The {@code offset} argument is the
      * index of the first character of the subarray and the {@code count}
      * argument specifies the length of the subarray. The contents of the
      * subarray are copied; subsequent modification of the character array does
      * not affect the newly created string.
      *
    + * <p> The contents of the string are unspecified if the character array
    + * is modified during string construction.
    + *
      * @param  value
      *         Array that is the source of characters
      *
      * @param  offset
      *         The initial offset
      *
      * @param  count
      *         The length
      *
      * @throws  IndexOutOfBoundsException
      *          If {@code offset} is negative, {@code count} is negative, or
      *          {@code offset} is greater than {@code value.length - count}
      */
     public String(char[] value, int offset, int count) {...}
    
     /**
      * Allocates a new {@code String} that contains characters from a subarray
      * of the <a href="Character.html#unicode">Unicode code point</a> array
      * argument.  The {@code offset} argument is the index of the first code
      * point of the subarray and the {@code count} argument specifies the
      * length of the subarray.  The contents of the subarray are converted to
      * {@code char}s; subsequent modification of the {@code int} array does not
      * affect the newly created string.
      *
    + * <p> The contents of the string are unspecified if the codepoints array
    + * is modified during string construction.
    + *
      * @param  codePoints
      *         Array that is the source of Unicode code points
      *
      * @param  offset
      *         The initial offset
      *
      * @param  count
      *         The length
      *
      * @throws  IllegalArgumentException
      *          If any invalid Unicode code point is found in {@code
      *          codePoints}
      *
      * @throws  IndexOutOfBoundsException
      *          If {@code offset} is negative, {@code count} is negative, or
      *          {@code offset} is greater than {@code codePoints.length - count}
      *
      * @since  1.5
      */
     public String(int[] codePoints, int offset, int count) {...} 
    
     /**
      * Allocates a new {@code String} constructed from a subarray of an array
      * of 8-bit integer values.
      *
      * <p> The {@code offset} argument is the index of the first byte of the
      * subarray, and the {@code count} argument specifies the length of the
      * subarray.
      *
      * <p> Each {@code byte} in the subarray is converted to a {@code char} as
      * specified in the {@link #String(byte[],int) String(byte[],int)} constructor.
      *
    + * <p> The contents of the string are unspecified if the byte array
    + * is modified during string construction.
    + *
      * @deprecated This method does not properly convert bytes into characters.
      * As of JDK&nbsp;1.1, the preferred way to do this is via the
      * {@code String} constructors that take a {@link Charset}, charset name,
      * or that use the {@link Charset#defaultCharset() default charset}.
      *
      * @param  ascii
      *         The bytes to be converted to characters
      *
      * @param  hibyte
      *         The top 8 bits of each 16-bit Unicode code unit
      *
      * @param  offset
      *         The initial offset
      * @param  count
      *         The length
      *
      * @throws  IndexOutOfBoundsException
      *          If {@code offset} is negative, {@code count} is negative, or
      *          {@code offset} is greater than {@code ascii.length - count}
      *
      * @see  #String(byte[], int)
      * @see  #String(byte[], int, int, java.lang.String)
      * @see  #String(byte[], int, int, java.nio.charset.Charset)
      * @see  #String(byte[], int, int)
      * @see  #String(byte[], java.lang.String)
      * @see  #String(byte[], java.nio.charset.Charset)
      * @see  #String(byte[])
      */
     @Deprecated(since="1.1")
     public String(byte[] ascii, int hibyte, int offset, int count) {...}
    
     /**
      * Allocates a new {@code String} containing characters constructed from
      * an array of 8-bit integer values. Each character c in the
      * resulting string is constructed from the corresponding component
      * b in the byte array such that:
      *
      * ...
      *
    + * <p> The contents of the string are unspecified if the byte array
    + * is modified during string construction.
    + *
      * @deprecated  This method does not properly convert bytes into
      * characters.  As of JDK&nbsp;1.1, the preferred way to do this is via the
      * {@code String} constructors that take a {@link Charset}, charset name,
      * or that use the {@link Charset#defaultCharset() default charset}.
      *
      * @param  ascii
      *         The bytes to be converted to characters
      *
      * @param  hibyte
      *         The top 8 bits of each 16-bit Unicode code unit
      *
      * @see  #String(byte[], int, int, java.lang.String)
      * @see  #String(byte[], int, int, java.nio.charset.Charset)
      * @see  #String(byte[], int, int)
      * @see  #String(byte[], java.lang.String)
      * @see  #String(byte[], java.nio.charset.Charset)
      * @see  #String(byte[])
      */
     @Deprecated(since="1.1")
     public String(byte[] ascii, int hibyte) {...}
    
     /**
      * Constructs a new {@code String} by decoding the specified subarray of
      * bytes using the specified charset.  The length of the new {@code String}
      * is a function of the charset, and hence may not be equal to the length
      * of the subarray.
      *
      * <p> The behavior of this constructor when the given bytes are not valid
      * in the given charset is unspecified.  The {@link
      * java.nio.charset.CharsetDecoder} class should be used when more control
      * over the decoding process is required.
      *
    + * <p> The contents of the string are unspecified if the byte array
    + * is modified during string construction.
    + *
      * @param  bytes
      *         The bytes to be decoded into characters
      *
      * @param  offset
      *         The index of the first byte to decode
      *
      * @param  length
      *         The number of bytes to decode
      *
      * @param  charsetName
      *         The name of a supported {@linkplain java.nio.charset.Charset
      *         charset}
      *
      * @throws  UnsupportedEncodingException
      *          If the named charset is not supported
      *
      * @throws  IndexOutOfBoundsException
      *          If {@code offset} is negative, {@code length} is negative, or
      *          {@code offset} is greater than {@code bytes.length - length}
      *
      * @since  1.1
      */
     public String(byte[] bytes, int offset, int length, String charsetName)
             throws UnsupportedEncodingException {...}
     
     /**
      * Constructs a new {@code String} by decoding the specified subarray of
      * bytes using the specified {@linkplain java.nio.charset.Charset charset}.
      * The length of the new {@code String} is a function of the charset, and
      * hence may not be equal to the length of the subarray.
      *
      * <p> This method always replaces malformed-input and unmappable-character
      * sequences with this charset's default replacement string.  The {@link
      * java.nio.charset.CharsetDecoder} class should be used when more control
      * over the decoding process is required.
      *
    + * <p> The contents of the string are unspecified if the byte array
    + * is modified during string construction.
    + *
      * @param  bytes
      *         The bytes to be decoded into characters
      *
      * @param  offset
      *         The index of the first byte to decode
      *
      * @param  length
      *         The number of bytes to decode
      *
      * @param  charset
      *         The {@linkplain java.nio.charset.Charset charset} to be used to
      *         decode the {@code bytes}
      *
      * @throws  IndexOutOfBoundsException
      *          If {@code offset} is negative, {@code length} is negative, or
      *          {@code offset} is greater than {@code bytes.length - length}
      *
      * @since  1.6
      */
     public String(byte[] bytes, int offset, int length, Charset charset) {...}
    
     /**
      * Constructs a new {@code String} by decoding the specified array of bytes
      * using the specified {@linkplain java.nio.charset.Charset charset}.  The
      * length of the new {@code String} is a function of the charset, and hence
      * may not be equal to the length of the byte array.
      *
      * <p> The behavior of this constructor when the given bytes are not valid
      * in the given charset is unspecified.  The {@link
      * java.nio.charset.CharsetDecoder} class should be used when more control
      * over the decoding process is required.
      *
    + * <p> The contents of the string are unspecified if the byte array
    + * is modified during string construction.
    + *
      * @param  bytes
      *         The bytes to be decoded into characters
      *
      * @param  charsetName
      *         The name of a supported {@linkplain java.nio.charset.Charset
      *         charset}
      *
      * @throws  UnsupportedEncodingException
      *          If the named charset is not supported
      *
      * @since  1.1
      */
     public String(byte[] bytes, String charsetName)
             throws UnsupportedEncodingException {...}
    
     /**
      * Constructs a new {@code String} by decoding the specified array of
      * bytes using the specified {@linkplain java.nio.charset.Charset charset}.
      * The length of the new {@code String} is a function of the charset, and
      * hence may not be equal to the length of the byte array.
      *
      * <p> This method always replaces malformed-input and unmappable-character
      * sequences with this charset's default replacement string.  The {@link
      * java.nio.charset.CharsetDecoder} class should be used when more control
      * over the decoding process is required.
      *
    + * <p> The contents of the string are unspecified if the byte array
    + * is modified during string construction.
    + *
      * @param  bytes
      *         The bytes to be decoded into characters
      *
      * @param  charset
      *         The {@linkplain java.nio.charset.Charset charset} to be used to
      *         decode the {@code bytes}
      *
      * @since  1.6
      */
     public String(byte[] bytes, Charset charset) {...}
    
     /**
      * Constructs a new {@code String} by decoding the specified subarray of
      * bytes using the {@link Charset#defaultCharset() default charset}.
      * The length of the new {@code String} is a function of the charset,
      * and hence may not be equal to the length of the subarray.
      *
      * <p> The behavior of this constructor when the given bytes are not valid
      * in the default charset is unspecified.  The {@link
      * java.nio.charset.CharsetDecoder} class should be used when more control
      * over the decoding process is required.
      *
    + * <p> The contents of the string are unspecified if the byte array
    + * is modified during string construction.
    + *
      * @param  bytes
      *         The bytes to be decoded into characters
      *
      * @param  offset
      *         The index of the first byte to decode
      *
      * @param  length
      *         The number of bytes to decode
      *
      * @throws  IndexOutOfBoundsException
      *          If {@code offset} is negative, {@code length} is negative, or
      *          {@code offset} is greater than {@code bytes.length - length}
      *
      * @since  1.1
      */
     public String(byte[] bytes, int offset, int length) {...}
    
     /**
      * Constructs a new {@code String} by decoding the specified array of bytes
      * using the {@link Charset#defaultCharset() default charset}. The length
      * of the new {@code String} is a function of the charset, and hence may not
      * be equal to the length of the byte array.
      *
      * <p> The behavior of this constructor when the given bytes are not valid
      * in the default charset is unspecified.  The {@link
      * java.nio.charset.CharsetDecoder} class should be used when more control
      * over the decoding process is required.
      *
    + * <p> The contents of the string are unspecified if the byte array
    + * is modified during string construction.
    + *
      * @param  bytes
      *         The bytes to be decoded into characters
      *
      * @since  1.1
      */
     public String(byte[] bytes) {...}
    
     /**
      * Allocates a new string that contains the sequence of characters
      * currently contained in the string builder argument. The contents of the
      * string builder are copied; subsequent modification of the string builder
      * does not affect the newly created string.
      *
    + * <p> The contents of the string are unspecified if the {@code StringBuilder}
    + * is modified during string construction.
    + *
      * <p> This constructor is provided to ease migration to {@code StringBuilder}. 
      * Obtaining a string from a string builder via the {@code toString} 
      * method is likely to run faster and is generally preferred.
      *
      * @param   builder
      *          A {@code StringBuilder}
      *
      * @since  1.5
      */
     public String(StringBuilder builder) {...}
    
     /**
      * Returns the string representation of the {@code char} array
      * argument. The contents of the character array are copied; subsequent
      * modification of the character array does not affect the returned
      * string.
      *
    + * <p> The contents of the string are unspecified if the character array
    + * is modified during string construction.
    + *
      * @param   data     the character array.
      * @return  a {@code String} that contains the characters of the
      *          character array.
      */
     public static String valueOf(char[] data) {...}
    
     /**
      * Returns the string representation of a specific subarray of the
      * {@code char} array argument.
      * <p>
      * The {@code offset} argument is the index of the first
      * character of the subarray. The {@code count} argument
      * specifies the length of the subarray. The contents of the subarray
      * are copied; subsequent modification of the character array does not
      * affect the returned string.
      *
    + * <p> The contents of the string are unspecified if the character array
    + * is modified during string construction.
    + *
      * @param   data     the character array.
      * @param   offset   initial offset of the subarray.
      * @param   count    length of the subarray.
      * @return  a {@code String} that contains the characters of the
      *          specified subarray of the character array.
      * @throws    IndexOutOfBoundsException if {@code offset} is
      *          negative, or {@code count} is negative, or
      *          {@code offset+count} is larger than
      *          {@code data.length}.
      */
     public static String valueOf(char[] data, int offset, int count) {...}
 
java.lang.StringBuilder:

     /**
      * Appends a subsequence of the specified {@code CharSequence} to this
      * sequence.
      * <p>
      * Characters of the argument {@code s}, starting at
      * index {@code start}, are appended, in order, to the contents of
      * this sequence up to the (exclusive) index {@code end}. The length
      * of this sequence is increased by the value of {@code end - start}.
      * <p>
      * Let <i>n</i> be the length of this character sequence just prior to
      * execution of the {@code append} method. Then the character at
      * index <i>k</i> in this character sequence becomes equal to the
      * character at index <i>k</i> in this sequence, if <i>k</i> is less than
      * <i>n</i>; otherwise, it is equal to the character at index
      * <i>k+start-n</i> in the argument {@code s}.
      * <p>
      * If {@code s} is {@code null}, then this method appends
      * characters as if the s parameter was a sequence containing the four
      * characters {@code "null"}.
    + * <p>
    + * The contents are unspecified if the {@code CharSequence}
    + * is modified during the method call or an exception is thrown
    + * when accessing the {@code CharSequence}.
      *
      * @param   s the sequence to append.
      * @param   start   the starting index of the subsequence to be appended.
      * @param   end     the end index of the subsequence to be appended.
      * @return  a reference to this object.
      * @throws     IndexOutOfBoundsException if
      *             {@code start} is negative, or
      *             {@code start} is greater than {@code end} or
      *             {@code end} is greater than {@code s.length()}
      */
     @Override
     public StringBuilder append(CharSequence s, int start, int end) {...}
 
     /**
      * Inserts the specified {@code CharSequence} into this sequence.
      * <p>
      * The characters of the {@code CharSequence} argument are inserted,
      * in order, into this sequence at the indicated offset, moving up
      * any characters originally above that position and increasing the length
      * of this sequence by the length of the argument s.
      * <p>
      * The result of this method is exactly the same as if it were an
      * invocation of this object's
      * {@link #insert(int,CharSequence,int,int) insert}(dstOffset, s, 0, s.length())
      * method.
    + * <p>
    + * The contents are unspecified if the {@code CharSequence}
    + * is modified during the method call or an exception is thrown
    + * when accessing the {@code CharSequence}.
      *
      * <p>If {@code s} is {@code null}, then the four characters
      * {@code "null"} are inserted into this sequence.
      *
      * @param      dstOffset   the offset.
      * @param      s the sequence to be inserted
      * @return     a reference to this object.
      * @throws     IndexOutOfBoundsException  if the offset is invalid.
      */
     public AbstractStringBuilder insert(int dstOffset, CharSequence s) {...}

     /**
      * Inserts a subsequence of the specified {@code CharSequence} into
      * this sequence.
      * <p>
      * The subsequence of the argument {@code s} specified by
      * {@code start} and {@code end} are inserted,
      * in order, into this sequence at the specified destination offset, moving
      * up any characters originally above that position. The length of this
      * sequence is increased by {@code end - start}.
      * <p>
      * The character at index <i>k</i> in this sequence becomes equal to:
      * <ul>
      * <li>the character at index <i>k</i> in this sequence, if
      * <i>k</i> is less than {@code dstOffset}
      * <li>the character at index <i>k</i>{@code +start-dstOffset} in
      * the argument {@code s}, if <i>k</i> is greater than or equal to
      * {@code dstOffset} but is less than {@code dstOffset+end-start}
      * <li>the character at index <i>k</i>{@code -(end-start)} in this
      * sequence, if <i>k</i> is greater than or equal to
      * {@code dstOffset+end-start}
      * </ul><p>
      * The {@code dstOffset} argument must be greater than or equal to
      * {@code 0}, and less than or equal to the {@linkplain #length() length}
      * of this sequence.
      * <p>The start argument must be nonnegative, and not greater than
      * {@code end}.
      * <p>The end argument must be greater than or equal to
      * {@code start}, and less than or equal to the length of s.
      *
      * <p>If {@code s} is {@code null}, then this method inserts
      * characters as if the s parameter was a sequence containing the four
      * characters {@code "null"}.
    + * <p>
    + * The contents are unspecified if the {@code CharSequence}
    + * is modified during the method call or an exception is thrown
    + * when accessing the {@code CharSequence}.
      *
      * @param      dstOffset   the offset in this sequence.
      * @param      s       the sequence to be inserted.
      * @param      start   the starting index of the subsequence to be inserted.
      * @param      end     the end index of the subsequence to be inserted.
      * @return     a reference to this object.
      * @throws     IndexOutOfBoundsException  if {@code dstOffset}
      *             is negative or greater than {@code this.length()}, or
      *              {@code start} or {@code end} are negative, or
      *              {@code start} is greater than {@code end} or
      *              {@code end} is greater than {@code s.length()}
      */
     public StringBuilder insert(int dstOffset, CharSequence s,
                                         int start, int end) {...} 

     /**
      * Appends {@code count} copies of the specified {@code CharSequence} {@code cs}
      * to this sequence.
      * <p>
      * The length of this sequence increases by {@code count} times the
      * {@code CharSequence} length.
      * <p>
      * If {@code cs} is {@code null}, then the four characters
      * {@code "null"} are repeated into this sequence.
    + * <p>
    + * The contents are unspecified if the {@code CharSequence}
    + * is modified during the method call or an exception is thrown
    + * when accessing the {@code CharSequence}.
      *
      * @param cs     a {@code CharSequence}
      * @param count  number of times to copy
      *
      * @return  a reference to this object.
      *
      * @throws IllegalArgumentException  if {@code count} is negative
      *
      * @since 21
      */
     public StringBuilder repeat(CharSequence cs, int count) {...}

 java.lang.Appendable:

     /**
      * Appends the specified character sequence to this {@code Appendable}.
      *
      * <p> Depending on which class implements the character sequence
      * {@code csq}, the entire sequence may not be appended.  For
      * instance, if {@code csq} is a {@link java.nio.CharBuffer} then
      * the subsequence to append is defined by the buffer's position and limit.
    + * <p>
    + * The contents are unspecified if the {@code CharSequence}
    + * is modified during the method call or an exception is thrown
    + * when accessing the {@code CharSequence}.
      *
      * @param  csq
      *         The character sequence to append.  If {@code csq} is
      *         {@code null}, then the four characters {@code "null"} are
      *         appended to this Appendable.
      *
      * @return  A reference to this {@code Appendable}
      *
      * @throws  IOException
      *          If an I/O error occurs
      */
     Appendable append(CharSequence csq) throws IOException;
 
     /**
      * Appends a subsequence of the specified character sequence to this
      * {@code Appendable}.
      *
      * <p> An invocation of this method of the form {@code out.append(csq, start, end)}
      * when {@code csq} is not {@code null}, behaves in
      * exactly the same way as the invocation
      *
      * <pre>
      *     out.append(csq.subSequence(start, end)) </pre>
      *
    + * <p>
    + * The contents are unspecified if the {@code CharSequence}
    + * is modified during the method call or an exception is thrown
    + * when accessing the {@code CharSequence}.
      * @param  csq
      *         The character sequence from which a subsequence will be
      *         appended.  If {@code csq} is {@code null}, then characters
      *         will be appended as if {@code csq} contained the four
      *         characters {@code "null"}.
      *
      * @param  start
      *         The index of the first character in the subsequence
      *
      * @param  end
      *         The index of the character following the last character in the
      *         subsequence
      *
      * @return  A reference to this {@code Appendable}
      *
      * @throws  IndexOutOfBoundsException
      *          If {@code start} or {@code end} are negative, {@code start}
      *          is greater than {@code end}, or {@code end} is greater than
      *          {@code csq.length()}
      *
      * @throws  IOException
      *          If an I/O error occurs
      */
     Appendable append(CharSequence csq, int start, int end) throws IOException;
 
Comments
Moving updated request to Approved.
27-11-2023

Moving back to Provisional; please add one or more reviewers before Finalizing the CSR.
27-11-2023

Upon recommendations of review comments the terminology has been changed to use "unspecified" to describe the behavior of string constructors under race conditions replacing "indeterminate". The PR and CSR have been updated.
21-11-2023

Moving to Approved.
13-11-2023

Moving to Provisional, not Approved.
04-11-2023