JDK-8278540 : Poor string concatenation performance on initial invoke
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.lang.invoke
  • Affected Version: 17,18
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • OS: linux
  • CPU: arm
  • Submitted: 2021-12-08
  • Updated: 2022-09-19
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Description
ADDITIONAL SYSTEM INFORMATION :
32-bit armv7 (Zynq 7020)

A DESCRIPTION OF THE PROBLEM :
The first time string concatenation is performed with a sufficiently complex set of operands (e.g. multiple floating point concatenations), it takes a significant amount of time.  Later calls are much faster, but the inconsistent performance of the first call is an issue in real-time scenarios on embedded platforms.  In testing, the initial call can take 100s of ms, while later calls take a fraction of the time, while a manual StringBuilder approach only takes 20 ms.

It was possible to work around this before JDK-8245455 was merged, because BC_SB could be selected as an alternative string concatenation strategy, and BC_SB has much better performance on the first invocation, and no worse performance on later invocations.  While the performance of the default string concatenation strategy has improved substantially from JDK 11 to JDK 17, the performance of it in JDK 17 is still insufficient for real-time applications (on first time invoke).

REGRESSION : Last worked in version 14

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the test case on a slower platform, particularly an embedded platform.


EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Running this on Java 11 with -Djava.lang.invoke.stringConcat=BC_SB on the above platform results in:
Don't slow me down, m_a: 1.0, m_b: 2.0, m_c: 3.0, m_d: 4.0, m_e: 5.0, m_f: 6.0, m_g: 7.0, m_h: 8.0, m_i: 9.0
Slow me down, m_a: 1.0, m_b: 2.0, m_b: 3.0, m_b: 4.0, m_b: 5.0, m_b: 6.0, m_b: 7.0, m_b: 8.0, m_b: 9.0
Slow me down, m_a: 1.0, m_b: 2.0, m_b: 3.0, m_b: 4.0, m_b: 5.0, m_b: 6.0, m_b: 7.0, m_b: 8.0, m_b: 9.0
Slow me down, m_a: 1.0, m_b: 2.0, m_b: 2.0, m_b: 4.0, m_b: 5.5, m_b: 4.2, m_b: 7.0, m_b: 8.0, m_b: 9.0
 fast time: 0.019999980926513672
 slow time: 0.029999971389770508
 slow time (again): 9.999275207519531E-4
 slow time (after changing): 9.999275207519531E-4

Notice the very similar times between fast time and slow time.  The first "slow time" call takes about 50% longer than the manual string builder, and has reasonable performance after that.
ACTUAL -
Running this on Java 17 with the default string concat strategy on the same platform results in:
Don't slow me down, m_a: 1.0, m_b: 2.0, m_c: 3.0, m_d: 4.0, m_e: 5.0, m_f: 6.0, m_g: 7.0, m_h: 8.0, m_i: 9.0
Slow me down, m_a: 1.0, m_b: 2.0, m_b: 3.0, m_b: 4.0, m_b: 5.0, m_b: 6.0, m_b: 7.0, m_b: 8.0, m_b: 9.0
Slow me down, m_a: 1.0, m_b: 2.0, m_b: 3.0, m_b: 4.0, m_b: 5.0, m_b: 6.0, m_b: 7.0, m_b: 8.0, m_b: 9.0
Slow me down, m_a: 1.0, m_b: 2.0, m_b: 2.0, m_b: 4.0, m_b: 5.5, m_b: 4.2, m_b: 7.0, m_b: 8.0, m_b: 9.0
 fast time: 0.017999887466430664
 slow time: 0.23000001907348633
 slow time (again): 0.0
 slow time (after changing): 9.999275207519531E-4

Notice the very poor "slow time" performance (over 230 ms vs 30 ms in the Java 11 BC_SB case).  While this is a much better result than Java 11 (which had "slow time" of over 750 ms!), but it is still an order of magnitude worse than the BC_SB strategy, and the performance on later calls is basically the same.

---------- BEGIN SOURCE ----------
public class Slow {

  private static double m_a = 1.0;
  private static double m_b = 2.0;
  private static double m_c = 3.0;
  private static double m_d = 4.0;
  private static double m_e = 5.0;
  private static double m_f = 6.0;
  private static double m_g = 7.0;
  private static double m_h = 8.0;
  private static double m_i = 9.0;

  private static String m_output;

  public static double slowString() {
    double start = System.currentTimeMillis() / 1000.0;

    String fred;
    fred = "Slow me down" + ", m_a: " + m_a + ", m_b: " + m_b + ", m_b: " + m_c + ", m_b: " + m_d + ", m_b: " + m_e
        + ", m_b: " + m_f + ", m_b: " + m_g + ", m_b: " + m_h + ", m_b: " + m_i;

    double end = System.currentTimeMillis() / 1000.0;

    m_output += fred + "\n";
    return end - start;
  }

  public static double fastString() {
    double start = System.currentTimeMillis() / 1000.0;

    StringBuilder joe = new StringBuilder(256);
    joe.append("Don't slow me down");
    joe.append(", m_a: ").append(m_a);
    joe.append(", m_b: ").append(m_b);
    joe.append(", m_c: ").append(m_c);
    joe.append(", m_d: ").append(m_d);
    joe.append(", m_e: ").append(m_e);
    joe.append(", m_f: ").append(m_f);
    joe.append(", m_g: ").append(m_g);
    joe.append(", m_h: ").append(m_h);
    joe.append(", m_i: ").append(m_i);

    double end = System.currentTimeMillis() / 1000.0;

    m_output += joe.toString() + "\n";

    return end - start;
  }

  public static void main(String[] args) {
    m_output = "";
    double f = fastString();
    double t = slowString();
    double t2 = slowString();
    m_e = 5.5;
    m_f = 4.2;
    m_c = 2.0;
    double t3 = slowString();

    m_output += " fast time: " + f + "\n";
    m_output += " slow time: " + t + "\n";
    m_output += " slow time (again): " + t2 + "\n";
    m_output += " slow time (after changing): " + t3 + "\n";

    System.out.println(m_output);
  }
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Prior to JDK-8245455, -Djava.lang.invoke.stringConcat=BC_SB could be used as a workaround.  Since that has been merged, there is no workaround.

FREQUENCY : always



Comments
I can reproduce similar 10x overhead compared to 11 / BC_SB for the provided test case. Main driver for the extra overhead is, unsurprisingly, the spinning of LambdaForm classes. But one observation is that quite a few of the LFs in this case will are never invoked directly. I did a crude experiment to try and disable this for such LFs we know will never be invoked directly, filtering out part of the unused LFs, and that reduces "slow time" cost by 25%: https://github.com/openjdk/jdk/pull/8734 I think we can do better than this and should look at making LF preparation completely lazy.
16-05-2022

(I must have missed any notification about having been assigned this, sorry for the late reply) The "best" workaround to eliminate such startup issues entirely might be to compile affected classes with -XDstringConcat=inline and revert to pre-JDK 9 StringBuilder chains. It might be an undocumented option, but so is/was the BC_SB. BC_SB has other issues at scale since it forces the generation of a class per string concatenation call-site. I'll have a look at this specific issue if there's something we can do to accelerate the first-time invocation.
13-05-2022

The observations on Windows 10: Outputs of JDK 11 Don't slow me down, m_a: 1.0, m_b: 2.0, m_c: 3.0, m_d: 4.0, m_e: 5.0, m_f: 6.0, m_g: 7.0, m_h: 8.0, m_i: 9.0 Slow me down, m_a: 1.0, m_b: 2.0, m_b: 3.0, m_b: 4.0, m_b: 5.0, m_b: 6.0, m_b: 7.0, m_b: 8.0, m_b: 9.0 Slow me down, m_a: 1.0, m_b: 2.0, m_b: 3.0, m_b: 4.0, m_b: 5.0, m_b: 6.0, m_b: 7.0, m_b: 8.0, m_b: 9.0 Slow me down, m_a: 1.0, m_b: 2.0, m_b: 2.0, m_b: 4.0, m_b: 5.5, m_b: 4.2, m_b: 7.0, m_b: 8.0, m_b: 9.0 fast time: 9.999275207519531E-4 slow time: 0.0010001659393310547 slow time (again): 0.0 slow time (after changing): 0.0 Outputs of JDK 17: Don't slow me down, m_a: 1.0, m_b: 2.0, m_c: 3.0, m_d: 4.0, m_e: 5.0, m_f: 6.0, m_g: 7.0, m_h: 8.0, m_i: 9.0 Slow me down, m_a: 1.0, m_b: 2.0, m_b: 3.0, m_b: 4.0, m_b: 5.0, m_b: 6.0, m_b: 7.0, m_b: 8.0, m_b: 9.0 Slow me down, m_a: 1.0, m_b: 2.0, m_b: 3.0, m_b: 4.0, m_b: 5.0, m_b: 6.0, m_b: 7.0, m_b: 8.0, m_b: 9.0 Slow me down, m_a: 1.0, m_b: 2.0, m_b: 2.0, m_b: 4.0, m_b: 5.5, m_b: 4.2, m_b: 7.0, m_b: 8.0, m_b: 9.0 fast time: 0.0010001659393310547 slow time: 0.010999917984008789 slow time (again): 0.0 slow time (after changing): 0.0 Outputs of JDK 18ea+24: Don't slow me down, m_a: 1.0, m_b: 2.0, m_c: 3.0, m_d: 4.0, m_e: 5.0, m_f: 6.0, m_g: 7.0, m_h: 8.0, m_i: 9.0 Slow me down, m_a: 1.0, m_b: 2.0, m_b: 3.0, m_b: 4.0, m_b: 5.0, m_b: 6.0, m_b: 7.0, m_b: 8.0, m_b: 9.0 Slow me down, m_a: 1.0, m_b: 2.0, m_b: 3.0, m_b: 4.0, m_b: 5.0, m_b: 6.0, m_b: 7.0, m_b: 8.0, m_b: 9.0 Slow me down, m_a: 1.0, m_b: 2.0, m_b: 2.0, m_b: 4.0, m_b: 5.5, m_b: 4.2, m_b: 7.0, m_b: 8.0, m_b: 9.0 fast time: 9.999275207519531E-4 slow time: 0.010999917984008789 slow time (again): 0.0 slow time (after changing): 0.0
10-12-2021

JDK-8245455 was introduced in JDK 15 b25.
10-12-2021