JDK-8276238 : Clarify "default charset" descriptions in String class
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.lang
  • Priority: P4
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 18
  • Submitted: 2021-11-01
  • Updated: 2021-11-02
  • Resolved: 2021-11-02
Related Reports
CSR :  
Description
Summary
-------

Clarify the "default charset" in several locations of `java.lang.String` class documentation.

Problem
-------

There are locations that read "platform's default charset." Users would be confused it would mean the `native.encoding` which was the case before JEP 400, but now it is `UTF-8`.

Solution
--------

Clarify the phrase "platform's default charset" in `java.lang.String` class and methods descriptions, by replacing it with a link to `java.nio.charset.Charset#defaultCharset()` so that  the definition of  "default charset" is clearly understood.

Specification
-------------

    diff a/src/java.base/share/classes/java/lang/String.java b/src/java.base/share/classes/java/lang/String.java
    --- a/src/java.base/share/classes/java/lang/String.java
    +++ b/src/java.base/share/classes/java/lang/String.java
    @@ -364,13 +364,12 @@
          * <p> Each {@code byte} in the subarray is converted to a {@code char} as
          * specified in the {@link #String(byte[],int) String(byte[],int)} constructor.
          *
          * @deprecated This method does not properly convert bytes into characters.
          * As of JDK&nbsp;1.1, the preferred way to do this is via the
    -     * {@code String} constructors that take a {@link
    -     * java.nio.charset.Charset}, charset name, or that use the platform's
    -     * default charset.
    +     * {@code String} constructors that take a {@link Charset}, charset name,
    +     * or that use the {@link Charset#defaultCharset() default charset}.
          *
          * @param  ascii
          *         The bytes to be converted to characters
          *
          * @param  hibyte
    @@ -426,13 +425,12 @@
          *                         | (<b><i>b</i></b> &amp; 0xff))
          * </pre></blockquote>
          *
          * @deprecated  This method does not properly convert bytes into
          * characters.  As of JDK&nbsp;1.1, the preferred way to do this is via the
    -     * {@code String} constructors that take a {@link
    -     * java.nio.charset.Charset}, charset name, or that use the platform's
    -     * default charset.
    +     * {@code String} constructors that take a {@link Charset}, charset name,
    +     * or that use the {@link Charset#defaultCharset() default charset}.
          *
          * @param  ascii
          *         The bytes to be converted to characters
          *
          * @param  hibyte
    @@ -1383,13 +1381,13 @@
             this(bytes, 0, bytes.length, charset);
         }
     
         /**
          * Constructs a new {@code String} by decoding the specified subarray of
    -     * bytes using the platform's default charset.  The length of the new
    -     * {@code String} is a function of the charset, and hence may not be equal
    -     * to the length of the subarray.
    +     * bytes using the {@link Charset#defaultCharset() default charset}.
    +     * The length of the new {@code String} is a function of the charset,
    +     * and hence may not be equal to the length of the subarray.
          *
          * <p> The behavior of this constructor when the given bytes are not valid
          * in the default charset is unspecified.  The {@link
          * java.nio.charset.CharsetDecoder} class should be used when more control
          * over the decoding process is required.
    @@ -1413,13 +1411,13 @@
             this(bytes, offset, length, Charset.defaultCharset());
         }
     
         /**
          * Constructs a new {@code String} by decoding the specified array of bytes
    -     * using the platform's default charset.  The length of the new {@code
    -     * String} is a function of the charset, and hence may not be equal to the
    -     * length of the byte array.
    +     * using the {@link Charset#defaultCharset() default charset}. The length
    +     * of the new {@code String} is a function of the charset, and hence may not
    +     * be equal to the length of the byte array.
          *
          * <p> The behavior of this constructor when the given bytes are not valid
          * in the default charset is unspecified.  The {@link
          * java.nio.charset.CharsetDecoder} class should be used when more control
          * over the decoding process is required.
    @@ -1691,11 +1689,12 @@
          *     dstBegin + (srcEnd-srcBegin) - 1
          * </pre></blockquote>
          *
          * @deprecated  This method does not properly convert characters into
          * bytes.  As of JDK&nbsp;1.1, the preferred way to do this is via the
    -     * {@link #getBytes()} method, which uses the platform's default charset.
    +     * {@link #getBytes()} method, which uses the {@link Charset#defaultCharset()
    +     * default charset}.
          *
          * @param  srcBegin
          *         Index of the first character in the string to copy
          *
          * @param  srcEnd
    @@ -1778,11 +1777,12 @@
             return encode(charset, coder(), value);
          }
     
         /**
          * Encodes this {@code String} into a sequence of bytes using the
    -     * platform's default charset, storing the result into a new byte array.
    +     * {@link Charset#defaultCharset() default charset}, storing the result
    +     * into a new byte array.
          *
          * <p> The behavior of this method when this string cannot be encoded in
          * the default charset is unspecified.  The {@link
          * java.nio.charset.CharsetEncoder} class should be used when more control
          * over the encoding process is required.



Comments
Thanks, [~darcy]. I scanned the jdk source manually when I was working on JEP400, but somehow it was missed. Possibly bad regex.
02-11-2021

Moving to Approved for JDK 18. I assume some kind of automated search was used to flush out uses of "default charset" in the docs.
02-11-2021

Thanks. Modified as suggested.
01-11-2021

These changes are good. I think the provided solution can be characterized as more than just removing the word "platform's". By adding the link, you're also deferring the definition of "default charset" to the specification of Charset.defaultCharset().
01-11-2021