JDK-8221760 : String concatenation performance issue starting from 11u
  • Type: Bug
  • Component: performance
  • Sub-Component: hotspot
  • Affected Version: 11,12,13
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2019-04-01
  • Updated: 2022-10-13
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Description
(Summary is provisional until the reason is found)

Take this benchmark:

package org.openjdk;

import org.openjdk.jmh.annotations.*;
import java.util.concurrent.TimeUnit;

@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(value = 3, jvmArgsAppend = {"-Xms2g", "-Xmx2g", "-XX:+UseParallelGC"})
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class ConcatSO {
    @Benchmark
    public String test() {
        String s1 = "STRING ONE";
        String s2 = "STRING TWO";
        return "abc " + s1 + " def " + s2;
    }
}

...and run it with different JDKs and compilation targets:

Benchmark      Mode  Cnt   Score   Error  Units

# target=8, 8u191
ConcatSO.test  avgt   15  14.071 ± 0.112  ns/op

# target=8, 11.0.2
ConcatSO.test  avgt   15  12.438 ± 0.114  ns/op

# target=9, 9 GA
ConcatSO.test  avgt   15  12.681 ± 0.135  ns/op

# target=9, 11.0.2
ConcatSO.test  avgt   15  14.211 ± 0.086  ns/op  ; <---- !!!

# target=11, 11.0.2
ConcatSO.test  avgt   15  14.169 ± 0.069  ns/op

# target=11, 11.0.2, -Djava.lang.invoke.stringConcat=BC_SB
ConcatSO.test  avgt   15  12.477 ± 0.077  ns/op

Looks at "!!!" -- something had regressed in runtime parts to make it slower with 11u compared to 9, even with the same bytecode. Probably something in java.lang.invoke changed, seeing how BC_SB strategy recovers the performance.

Comments
Hi [~shade] could you confirm this is related to the vzeroupper on your own systems? Thanks Eric
02-12-2019

[~thartmann] whoops - thanks
23-07-2019

Hi [~shade] AFAICT this is due to JDK-8178811, where Intel reported gains on other benchmarks with that in place. Those benchmarks probably all use AVX where this micro does not, but it sill gets the vzeroupper call setup anyway. Try running with -XX:UseAVX=0 and see if you can confirm it on your systems.
23-07-2019

Looking at perfasm output differences between 9u and 11u, there seem to be three micro-issues that eat up cycles: ThreadLocalHandshakes and calls to arraycopy that have vzeroupper: http://cr.openjdk.java.net/~shade/8221760/jdk9.perfasm http://cr.openjdk.java.net/~shade/8221760/jdk13.perfasm
01-04-2019