JDK-8327247 : C2 uses up to 2GB of RAM to compile complex string concat in extreme cases
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.lang.invoke
  • Affected Version: 11,17,21,22,23
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2024-03-04
  • Updated: 2024-08-15
  • Resolved: 2024-04-26
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 23
23 b21Fixed
Related Reports
Relates :  
Relates :  
Sub Tasks
JDK-8331134 :  
Description
ADDITIONAL SYSTEM INFORMATION :
Reproduced with JDK 11–22 EA (Corretto latest nightly build):
1. MacOS 14.x, Intel x64
2. MacOS 11.x, Apple M1
3. Linux (Ubuntu 16.04.7), Intel x64
4. Windows 10, Intel x64

A DESCRIPTION OF THE PROBLEM :
C2 compilation RAM and time requirements to optimise `v1 + " " + v2 + " " + ... + " " + vn` statement grow exponentially with each added field and reaches 2GB/20s when fields number is maxed (i.e. increasing it further causes C2 compilation to fail). May be related to JEP 280. It does mention possible performance regressions in the "Risks" section, but it the same time states "Startup time and time to performance do not regress beyond reasonable levels" as one of it's success goals.

REGRESSION : Last worked in version 8u401

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run "Source code for an executable test case" with JDK 22 with "-XX:CompileCommand=MemStat,Test::toString,print" command line option. C2 compilation reports ~1GB of RAM used (on MacOS with Intel CPU — 2GB). Alternatively any JDK >= 11 can by used, but observing compilation memory usage will require enabling and checking NMT frequently.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Compiler uses amount of memory similar to JDK 8 (i.e. much less than 100MB).
ACTUAL -
Compilation reports below.
MacOS (Intel) with 123 fields (increasing number further causes C2 compilation err):
> total     NA        RA        result  #nodes  time    type  #rc thread              method
> 2099999784 111517488 1974482496 ok    230825  20.841  c2    2   0x00007fb3a7027600  Test::toString(()Ljava/lang/String;)

MacOS (Apple) with 123 fields:
> total     NA        RA        result  #nodes  time    type  #rc thread              method
> 867302816 64827120  787501632 ok      157349  4,776   c2    2   0x0000000139048400  Test::toString(()Ljava/lang/String;) 

Ubuntu 16.04 (Intel) with 123 fields:
> total     NA        RA        result  #nodes  time    type  #rc thread              method
> 937798144 67281720  855640704 ok      131653  27.425  c2    2   0x00007faa5414d700  Test::toString(()Ljava/lang/String;)
If fields number is increased to 181 then total caps at 1.7GB (increasing fields number further causes C2 compilation err).

On Windows (Intel) with 123 fields:
> total     NA        RA        result  #nodes  time    type  #rc thread              method
> 941489704 68394472  858186784 ok      125189  8.510   c2    2   0x00000150f1034170  Test::toString(()Ljava/lang/String;)
If fields number is increased to 181 then total caps at 1.7GB (increasing fields number further causes C2 compilation err).

---------- BEGIN SOURCE ----------
public class Test {
    public static void main(String[] args) {
        long start = System.currentTimeMillis();
        int sink = 0;
        while (System.currentTimeMillis() - start < 60000) {
            sink += new Test().toString().length();
        }
        System.out.println(sink);
    }

    private String
            f0="1",  f1="1",  f2="1",  f3="1",  f4="1",  f5="1",  f6="1",  f7="1",  f8="1",  f9="1",
            f10="1", f11="1", f12="1", f13="1", f14="1", f15="1", f16="1", f17="1", f18="1", f19="1",
            f20="1", f21="1", f22="1", f23="1", f24="1", f25="1", f26="1", f27="1", f28="1", f29="1",
            f30="1", f31="1", f32="1", f33="1", f34="1", f35="1", f36="1", f37="1", f38="1", f39="1",
            f40="1", f41="1", f42="1", f43="1", f44="1", f45="1", f46="1", f47="1", f48="1", f49="1",
            f50="1", f51="1", f52="1", f53="1", f54="1", f55="1", f56="1", f57="1", f58="1", f59="1",
            f60="1", f61="1", f62="1", f63="1", f64="1", f65="1", f66="1", f67="1", f68="1", f69="1",
            f70="1", f71="1", f72="1", f73="1", f74="1", f75="1", f76="1", f77="1", f78="1", f79="1",
            f80="1", f81="1", f82="1", f83="1", f84="1", f85="1", f86="1", f87="1", f88="1", f89="1",
            f90="1", f91="1", f92="1", f93="1", f94="1", f95="1", f96="1", f97="1", f98="1", f99="1",
           f100="1",f101="1",f102="1",f103="1",f104="1",f105="1",f106="1",f107="1",f108="1",f109="1",
           f110="1",f111="1",f112="1",f113="1",f114="1",f115="1",f116="1",f117="1",f118="1",f119="1",
           f120="1",f121="1",f122="1";

    @Override
    public String toString() {
        return     f0+","+  f1+","+  f2+","+  f3+","+  f4+","+  f5+","+  f6+","+  f7+","+  f8+","+  f9+","
                + f10+","+ f11+","+ f12+","+ f13+","+ f14+","+ f15+","+ f16+","+ f17+","+ f18+","+ f19+","
                + f20+","+ f21+","+ f22+","+ f23+","+ f24+","+ f25+","+ f26+","+ f27+","+ f28+","+ f29+","
                + f30+","+ f31+","+ f32+","+ f33+","+ f34+","+ f35+","+ f36+","+ f37+","+ f38+","+ f39+","
                + f40+","+ f41+","+ f42+","+ f43+","+ f44+","+ f45+","+ f46+","+ f47+","+ f48+","+ f49+","
                + f50+","+ f51+","+ f52+","+ f53+","+ f54+","+ f55+","+ f56+","+ f57+","+ f58+","+ f59+","
                + f60+","+ f61+","+ f62+","+ f63+","+ f64+","+ f65+","+ f66+","+ f67+","+ f68+","+ f69+","
                + f70+","+ f71+","+ f72+","+ f73+","+ f74+","+ f75+","+ f76+","+ f77+","+ f78+","+ f79+","
                + f80+","+ f81+","+ f82+","+ f83+","+ f84+","+ f85+","+ f86+","+ f87+","+ f88+","+ f89+","
                + f90+","+ f91+","+ f92+","+ f93+","+ f94+","+ f95+","+ f96+","+ f97+","+ f98+","+ f99+","
                +f100+","+f101+","+f102+","+f103+","+f104+","+f105+","+f106+","+f107+","+f108+","+f109+","
                +f110+","+f111+","+f112+","+f113+","+f114+","+f115+","+f116+","+f117+","+f118+","+f119+","
                +f120+","+f121+","+f122;
    }
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
1. disable affected method compilation at runtime (-XX:CompileCommand=exclude,...), but requires time consuming search for such methods
2. compile with "-XDstringConcat=inline" option

FREQUENCY : always



Comments
Changeset: 5e2ced4b Author: Claes Redestad <redestad@openjdk.org> Date: 2024-04-26 12:36:55 +0000 URL: https://git.openjdk.org/jdk/commit/5e2ced4b9e1c9953e459dc152076520e5ef9d76c
26-04-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/18953 Date: 2024-04-25 14:15:56 +0000
25-04-2024

Ok, I could take a stab at this in the near term. [~shade] while the bytecode-based implementations could be revived, IIRC those strategies also came with a few rather problematic downsides such as generating a unique class per call-site. Which is quite costly on typical applications that have a lot of small-arity concats and more of a long tail of high arity expressions. However, as this issue is really one that manifests only for very complex expressions with a huge number of arguments - expressions that are likely rare and with low degree of sharing - then perhaps for those cases generating a class per concatenation might actually be a good trade-off. Assuming a threshold parameter to use a bytecode strategy for expressions with an arity above the threshold, and the current known-best strategy for values below it. Thinking that might be a good starting point for exploring solutions here, at least. A good opportunity to try out the new classfile API, too.
03-04-2024

Sorry, I don't have time to look at it. Maybe Claes is interested in optimizing this, as he did SCF optimization work before, see JDK-8278540. This might prompt some optimizations generically on j.l.invoke side. Separately, I wish we did leave at least one bytecode-based, not MH-based implementation strategy in SCF, so that it would give us another workaround here past JDK 16, see JDK-8245455.
12-03-2024

[~shade] Would you have time to take a look at this and see if there's something that could be optimized from the Indify String Concat side?
12-03-2024

This is due to "JEP 280: Indify String Concatenation" JDK-8085796 creating a large number of method handles that are then inlined and compiled by C2: @ 45 java.lang.invoke.LambdaForm$MH/0x00007f798f148000::invoke (45 bytes) force inline by annotation @ 20 java.lang.invoke.LambdaForm$MH/0x00007f798f147c00::invoke (32 bytes) force inline by annotation @ 28 java.lang.invoke.DirectMethodHandle$Holder::invokeStatic (21 bytes) force inline by annotation @ 1 java.lang.invoke.DirectMethodHandle::internalMemberName (8 bytes) force inline by annotation @ 17 java.lang.StringConcatHelper::prepend (17 bytes) inline (hot) @ 3 java.lang.StringConcatHelper::prepend (37 bytes) inline (hot) @ 2 java.lang.String::length (11 bytes) inline (hot) @ 6 java.lang.String::coder (15 bytes) inline (hot) @ 21 java.lang.String::getBytes (44 bytes) inline (hot) @ 1 java.lang.String::coder (15 bytes) inline (hot) @ 22 java.lang.System::arraycopy (0 bytes) (intrinsic) @ 11 java.lang.StringConcatHelper::prepend (37 bytes) inline (hot) @ 2 java.lang.String::length (11 bytes) inline (hot) @ 6 java.lang.String::coder (15 bytes) inline (hot) @ 21 java.lang.String::getBytes (44 bytes) inline (hot) @ 1 java.lang.String::coder (15 bytes) inline (hot) @ 22 java.lang.System::arraycopy (0 bytes) (intrinsic)
12-03-2024

With the test case, I've noticed that we initialize a huge number of cloned identical constants at [1] in final graph reshaping which might not be necessary and could be shared. [1] https://github.com/openjdk/jdk/blob/a6dc4bc2b83c7240e573ac43f9b7a10191c58ed3/src/hotspot/share/opto/compile.cpp#L3257
08-03-2024

Initial ILW = Suspiciously large footprint of C2 arena which might be expected with compact strings (needs further investigation), edge case with extreme string concatination, disable indy string concatination with javac flag -XDstringConcat=inline = MLH = P4
08-03-2024

While MemStat shows a usage of roughly ~940MB, we could disable indy string concatenation which brings the memory usage down to ~33MB. This suggests that it could be related to compact strings.
08-03-2024