JDK-8262187 : CharsetEncoder.maxBytesPerChar() and CharsetDecoder.maxCharsPerByte() return float instead of int
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Affected Version: 8,11,15
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • OS: generic
  • CPU: generic
  • Submitted: 2021-02-23
  • Updated: 2021-03-01
Related Reports
Relates :  
Relates :  
Relates :  
Description
A DESCRIPTION OF THE PROBLEM :
The methods {{CharsetEncoder.maxBytesPerChar()}} and {{CharsetDecoder.maxCharsPerByte()}} both return a result representing a maximum. For such a value one would expect it to be integer, yet (for some reason) the methods return a {{float}} instead of {{int}}.

Even worse is that for the intended use case "worst-case size of the output buffer" (as described by the documentation), {{float}} will cause precision loss for large input buffers.
The JDK itself suffers from this, having to introduce the private method {{String.scale(int, float)}}.

Ideally these methods would be replaced with ones returning an {{int}} value, though that would break backward compatibility.
Maybe for now it would be best to:
1. Deprecate the methods; or at least add a big warning that the caller should cast the result to {{int}} before performing any further calculations with it
2. Verify in the CharsetEncoder / CharsetDecoder constructor that the {{float}} value actually represents an {{int}} value ({{value == (int) value}})
3. Consider adding more useful and efficient alternative methods, see also JDK-8230531 and JDK-8231434




Comments
JDK-4949631 String.getBytes() does not work on some strings larger than 16MB
01-03-2021

The use of "float" instead of "double" for these methods was indeed an ancient design blunder, causing HelloWorld itself to fail when the greeting string was longer than 2^24 and the caller was not aware of possible loss of precision when using float. There is probably a better design, but no one has tried, and as usual it's harder than it looks. I don't see the advantage of "int" over "double". There are existing coders that have clearly correct fractional values, most obviously the so-called double-byte character sets.
01-03-2021

Additional Information from submitter: =========================== To clarify: The main issue is that these methods return `float`. Their return value indicates a maximum number of primitive `byte` or `char`. Therefore representing this number as `float` makes no sense since there cannot be for example a String with 4.5 chars. The only use case where a `float` return type might be useful is when a charset only produces an additional suffix / prefix after X bytes / chars. Then only when multiplying >= X * maxBytesPerChar() would result in an additional char. However, this is not a documented use case and it is questionable whether charsets returning a max value with decimal digits (in case such charsets exist) actually use it like this. Additionally this raises the question how the caller of maxBytesPerChar() or maxCharsPerByte() should handle the result when it has decimal digits. Should they round down (implicit when casting to `int`) or should they round up? If they have to round up using the methods becomes even more cumbersome. A scaling method as suggested by Alan Bateman above would very likely help, in fact it might even solve JDK-8231434 as well, but it might sense to deprecate maxBytesPerChar() or maxCharsPerByte() nonetheless then.
01-03-2021

Changing this to an enhancement as it's not a bug. The submitter may be looking for a scale method to help size a buffer.
23-02-2021

The document of CharsetEncoder.maxBytesPerChar() and CharsetDecoder.maxCharsPerByte(): https://docs.oracle.com/javase/8/docs/api/java/nio/charset/CharsetEncoder.html#maxBytesPerChar-- https://docs.oracle.com/javase/8/docs/api/java/nio/charset/CharsetDecoder.html#maxCharsPerByte-- https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/charset/CharsetEncoder.html#maxBytesPerChar() https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/charset/CharsetDecoder.html#maxCharsPerByte() https://docs.oracle.com/en/java/javase/15/docs/api/java.base/java/nio/charset/CharsetEncoder.html#maxBytesPerChar() https://docs.oracle.com/en/java/javase/15/docs/api/java.base/java/nio/charset/CharsetDecoder.html#maxCharsPerByte()
23-02-2021