JDK-6393232 : (spec) String methods which take Charset should specify behaviour for invalid bytes
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 6
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2006-03-02
  • Updated: 2017-05-16
  • Resolved: 2006-03-23
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6
6 b78Fixed
Related Reports
Relates :  
Description
In bug 5005831, the following new methods were added:

  public byte[] getBytes(byte [], Charset);
  public String(byte bytes[], Charset charset);
  public String(byte bytes[], int offset, int length, Charset charset);

In all cases, we did not specify the behaviour on invalid input.  The Charset parameter dictates what that behaviour should be so we should specify it.

Replace this sentence in all of the new methods: 

   * <p> The behavior of this method when the string cannot be encoded in the
   * given charset is unspecified.

With this one:

   * <p> This method always replaces malformed-input and unmappable-character
   * sequences with this charset's default replacement XXX.

where XXX = "string" in String(byte [], Charset), String(byte[], int, int, Charset)
            "byte array" in String.getBytes(Charset)

Comments
SUGGESTED FIX 415c415 < * Constructs a new <tt>String</tt> by decoding the specified array of --- > * Constructs a new <tt>String</tt> by decoding the specified subarray of 418c418 < * hence may not be equal to the length of the byte array. --- > * hence may not be equal to the length of the subarray. 420,421c420,421 < * <p> The behavior of this constructor when the given bytes are not valid < * in the given charset is unspecified. The {@link --- > * <p> This method always replaces malformed-input and unmappable-character > * sequences with this charset's default replacement string. The {@link 486,487c486,487 < * <p> The behavior of this constructor when the given bytes are not valid < * in the given charset is unspecified. The {@link --- > * <p> This method always replaces malformed-input and unmappable-character > * sequences with this charset's default replacement string. The {@link 899,902c899,902 < * <p> The behavior of this method when the string cannot be encoded in the < * given charset is unspecified. The {@link < * java.nio.charset.CharsetEncoder} class should be used when more control < * over the encoding process is required. --- > * <p> This method always replaces malformed-input and unmappable-character > * sequences with this charset's default replacement byte array. The > * {@link java.nio.charset.CharsetEncoder} class should be used when more > * control over the encoding process is required.
08-03-2006

EVALUATION Yes. We should also use the term "subarray" in these new methods in a way which is consistent with other uses in methods which take a byte array with index and length.
02-03-2006