Bug ID: JDK-8148937 (str) Adapt StringJoiner for Compact Strings

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 17
17 b14Fixed

Current StringJoiner code tries to build the char[] storage by itself:

    @Override
    public String toString() {
        ...
        final String delimiter = this.delimiter;
        final char[] chars = new char[len + addLen];
        int k = getChars(prefix, chars, 0);
        if (size > 0) {
            k += getChars(elts[0], chars, k);
            for (int i = 1; i < size; i++) {
                k += getChars(delimiter, chars, k);
                k += getChars(elts[i], chars, k);
            }
        }
        k += getChars(suffix, chars, k);
        return jla.newStringUnsafe(chars);
    }

This is seems to be a performance optimization, but it clashes with Compact Strings which now have to re-compress the resulting char[] array into byte[]. We may want to extend this mechanism by figuring out the coders for arguments, and creating byte[] with appropriate coder, then using a private String constructor that does not re-compress.

Changeset: 000012a3 Author: Sergey Tsypanov <sergei.tsypanov@yandex.ru> Committer: Claes Redestad <redestad@openjdk.org> Date: 2021-03-17 13:34:58 +0000 URL: https://git.openjdk.java.net/jdk/commit/000012a3
17-03-2021
We may need to resolve or workaround JDK-8149758 to get absolute no-regression.
12-02-2016
I was thinking about this: http://cr.openjdk.java.net/~shade/8148937/webrev.01/ It seems to improve performance on large Strings: http://cr.openjdk.java.net/~shade/8148937/notes.txt
12-02-2016
I'm one of those people who considered having multiple representations for String back in the Dark Ages, but always eventually chickened out. The primary advantage of compressed storage is for long-lived under-used data in the JDK, not for short-lived builders like StringJoiner (and probably StringBuilder) . I'm skeptical that adding support for LATIN1 strings in StringJoiner will be an improvement. There's a stronger case if we know we are building a LATIN1 string and if we are willing to add a no-copy constructor for a String from a byte[]
04-02-2016
Not sure if we want to revert to StringBuilders for this. Instead, I think we might want to amend the current JavaLangAccess-style code to accept coders too. I'll take a look if that is viable.
04-02-2016
Hi, I'm responsible for this optimization. newStringUnsafe does not seem to be very popular in the JDK, and Compact Strings may be one of the reasons. Looking at StringBuilder in jdk9, I see that it can maintain a UTF-16 or LATIN1 representation, so that is an advantage of StringBuilder over building your own char[] that didn't exist in the past. StringBuilder will still need to do a final copy that StringJoiner will not, so the current implementation will be hard to beat - even harder if there are non-LATIN1 characters.
03-02-2016

Relates :	JDK-8149758 - Small arraycopy of non-constant length is slower than individual load/stores
Relates :	JDK-8148936 - Adapt UUID.toString() to Compact Strings