JDK-6826329 : (str) Fastpath for new String(bytes..) and String#getBytes(..) for ASCII + ISO-8859-1
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 7
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2009-04-03
  • Updated: 2021-06-26
Related Reports
Relates :  
Description
A DESCRIPTION OF THE REQUEST :
String#getBytes(..) and new String(bytes..) internally use slow and each time newly instatiated Charset-X-coders.

Additionally:
At first assumption user could think, that String#getBytes(byte[] buf, Charset cs) might be faster than String#getBytes(byte[] buf, String csn), because he assumes, that Charset would be internally created from csn.
As this is only true for the first call, there should be a *note* in JavaDoc about cost of those methods in comparision. Don't forget (byte[] ...) constructor's JavaDoc too.


JUSTIFICATION :
Assumed that ASCII and ISO-8859-1 have high percentage in usage on those methods especially for CORBA applications, we should have a fast shortcut in class String.

  See also:
http://cr.openjdk.java.net/~sherman/6636323_6636319/webrev
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6636319
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6636323



EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Fastpath for ASCII + ISO-8859-1 for methods and constructors like:
String#getBytes(..) and new String(bytes..)
Alternatives:
String#getASCIIBytes(..)
String#getISO8859_1Bytes(..)

ACTUAL -
byte[] getBytes(Charset charset)
internally instantiates CharsetEncoder which is much slower, especially on short strings.


---------- BEGIN SOURCE ----------
1 simple example:

public class String {
    ...
    int getBytes(byte[] buf, byte mask) {
        int j = 0;
        for (int i=0; i<values.length; i++, j++) {
            if (values[i] | mask == mask)
                buf[j] = (byte)values[i];
                continue;
            if (isHighSurrogate(values[i] && i+1<length && isLowSurrogate(values[i+1])
                 i++;
            buf[j] = '?'; // or default replacement
        }
        return j;
    ...
    }

---------- END SOURCE ----------