JDK-8226396 : C2: Redundant address computations for array accesses
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 11,13,14
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2019-06-19
  • Updated: 2019-11-26
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Duplicate :  
Relates :  
Description
public static void multArray(float[] a, float[] b, float[] m, int len) {
    for(int i = 0; i < len; i++) {
        m[i*3  ] = a[i*3]*b[i*3];
        m[i*3+1] = a[i*3+1]*b[i*3+1];
        m[i*3+2] = a[i*3+2]*b[i*3+2];
    }
}

# VM version: JDK 1.8.0_212, OpenJDK 64-Bit Server VM, 25.212-b04
Benchmark                 Mode  Cnt   Score   Error  Units
TestArrayPerf.testArray  thrpt   20  80.675 �� 0.299  ops/s

# VM version: JDK 14-internal, Java HotSpot(TM) 64-Bit Server VM, 14-internal+0-2019-06-13-0958212.vlivanov.null
Benchmark                 Mode  Cnt   Score   Error  Units
TestArrayPerf.testArray  thrpt   20  63.299 �� 0.126  ops/s

Comments
Okay, thanks for the clarification.
27-09-2019

> but that ConvI2L optimization was already in there before 9, right? [~thartmann] Yes, the transformation was there for a long time, but it wasn't applied in the test case until 9 (C2 wasn't able to prove the operations don't overflow?). Starting 9, some of the address computations for array accesses got transformed, but not all. It led to duplicated computations and more generated code which caused the slowdown.
27-09-2019

[~vlivanov] but that ConvI2L optimization was already in there before 9, right? http://hg.openjdk.java.net/jdk9/jdk9/hotspot/annotate/a61af66fc99e/src/share/vm/opto/connode.cpp#l820
25-09-2019

# VM version: JDK 1.8.0_212, Java HotSpot(TM) 64-Bit Server VM, 25.212-b34 TestArrayPerf.testArray thrpt 50 77.675 �� 0.857 ops/s ....[Hottest Region 1].............................................................................. C2, level 4, org.benchmark.TestArrayPerf::multArray, version 579 (138 bytes) 2.16% 0x00007f5461516a40: add %r9d,%ebx 2.58% 0x00007f5461516a43: cmp %esi,%ebx 0.00% 0x00007f5461516a45: jae 0x00007f5461516b83 2.53% 0x00007f5461516a4b: vmovss 0x10(%rax,%rbx,4),%xmm0 5.61% 0x00007f5461516a51: cmp %edi,%ebx 0.00% 0x00007f5461516a53: jae 0x00007f5461516bb4 2.50% 0x00007f5461516a59: vmulss 0x10(%rdx,%rbx,4),%xmm0,%xmm0 8.84% 0x00007f5461516a5f: cmp %ebp,%ebx 0.00% 0x00007f5461516a61: jae 0x00007f5461516be8 2.88% 0x00007f5461516a67: vmovss %xmm0,0x10(%rcx,%rbx,4) 2.73% 0x00007f5461516a6d: mov %ebx,%r11d 2.57% 0x00007f5461516a70: inc %r11d 2.47% 0x00007f5461516a73: mov %ebx,%r10d 2.51% 0x00007f5461516a76: add $0x2,%r10d 2.55% 0x00007f5461516a7a: cmp %esi,%r10d 0.00% 0x00007f5461516a7d: jae 0x00007f5461516c22 2.28% 0x00007f5461516a83: movslq %ebx,%rbx 2.48% 0x00007f5461516a86: vmovss 0x14(%rax,%rbx,4),%xmm1 5.64% 0x00007f5461516a8c: cmp %edi,%r10d 0.00% 0x00007f5461516a8f: jae 0x00007f5461516c5a 2.62% 0x00007f5461516a95: vmulss 0x14(%rdx,%rbx,4),%xmm1,%xmm1 8.92% 0x00007f5461516a9b: cmp %ebp,%r10d 0.00% 0x00007f5461516a9e: jae 0x00007f5461516c96 3.03% 0x00007f5461516aa4: vmovss %xmm1,0x14(%rcx,%rbx,4) 2.79% 0x00007f5461516aaa: vmovss 0x18(%rdx,%rbx,4),%xmm1 6.09% 0x00007f5461516ab0: vmulss 0x18(%rax,%rbx,4),%xmm1,%xmm0 8.62% 0x00007f5461516ab6: vmovss %xmm0,0x18(%rcx,%rbx,4) 2.81% 0x00007f5461516abc: mov %r9d,%ebx 2.46% 0x00007f5461516abf: shl %ebx 2.51% 0x00007f5461516ac1: add $0x2,%ebx 2.67% 0x00007f5461516ac4: inc %r9d 2.56% 0x00007f5461516ac7: cmp %r8d,%r9d 0.00% 0x00007f5461516aca: jl 0x00007f5461516a40 .................................................................................................... 95.40% <total for region 1>
19-06-2019

# VM version: JDK 14-internal, Java HotSpot(TM) 64-Bit Server VM, 14-internal+0-2019-06-13-0958212.vlivanov.null TestArrayPerf.testArray thrpt 50 63.547 �� 0.354 ops/s ....[Hottest Region 1].............................................................................. c2, level 4, org.benchmark.TestArrayPerf::multArray, version 661 (204 bytes) 1.75% 0x00007ff1e46ce7c0: mov %r11d,%r9d 1.99% 0x00007ff1e46ce7c3: add %ebx,%r9d 2.13% 0x00007ff1e46ce7c6: cmp %r10d,%r9d 0.01% 0x00007ff1e46ce7c9: jae 0x00007ff1e46ce92a 2.12% 0x00007ff1e46ce7cf: movslq %ebx,%rbx 1.85% 0x00007ff1e46ce7d2: movslq %r11d,%r14 1.91% 0x00007ff1e46ce7d5: add %rbx,%r14 2.08% 0x00007ff1e46ce7d8: vmovss 0x10(%rsi,%r14,4),%xmm0 6.72% 0x00007ff1e46ce7df: cmp %eax,%r9d 0.00% 0x00007ff1e46ce7e2: jae 0x00007ff1e46ce959 1.80% 0x00007ff1e46ce7e8: vmulss 0x10(%rdx,%r14,4),%xmm0,%xmm0 7.88% 0x00007ff1e46ce7ef: cmp %r13d,%r9d 0.00% 0x00007ff1e46ce7f2: jae 0x00007ff1e46ce989 2.58% 0x00007ff1e46ce7f8: vmovss %xmm0,0x10(%rcx,%r14,4) 2.12% 0x00007ff1e46ce7ff: mov %r9d,%ebx 1.96% 0x00007ff1e46ce802: add $0x2,%ebx 2.06% 0x00007ff1e46ce805: mov %r9d,%edi 2.00% 0x00007ff1e46ce808: inc %edi 2.08% 0x00007ff1e46ce80a: cmp %r10d,%ebx 0x00007ff1e46ce80d: jae 0x00007ff1e46ce9b6 2.08% 0x00007ff1e46ce813: vmovss 0x14(%rsi,%r14,4),%xmm0 7.24% 0x00007ff1e46ce81a: cmp %eax,%ebx 0.02% 0x00007ff1e46ce81c: jae 0x00007ff1e46ce9de 2.16% 0x00007ff1e46ce822: vmulss 0x14(%rdx,%r14,4),%xmm0,%xmm0 6.81% 0x00007ff1e46ce829: cmp %r13d,%ebx 0x00007ff1e46ce82c: jae 0x00007ff1e46cea0a 1.93% 0x00007ff1e46ce832: vmovss %xmm0,0x14(%rcx,%r14,4) 2.34% 0x00007ff1e46ce839: movslq %r9d,%r9 1.89% 0x00007ff1e46ce83c: vmovss 0x18(%rdx,%r9,4),%xmm0 6.61% 0x00007ff1e46ce843: vmulss 0x18(%rsi,%r9,4),%xmm0,%xmm1 10.20% 0x00007ff1e46ce84a: vmovss %xmm1,0x18(%rcx,%r9,4) 2.27% 0x00007ff1e46ce851: mov %r11d,%ebx 1.98% 0x00007ff1e46ce854: shl %ebx 2.11% 0x00007ff1e46ce856: add $0x2,%ebx 1.90% 0x00007ff1e46ce859: inc %r11d 2.08% 0x00007ff1e46ce85c: cmp %ebp,%r11d 0.01% 0x00007ff1e46ce85f: jl 0x00007ff1e46ce7c0 .................................................................................................... 94.92% <total for region 1>
19-06-2019

With problematic transformation disabled: TestArrayPerf.testArray thrpt 20 68.253 �� 0.182 ops/s ....[Hottest Region 1].............................................................................. c2, level 4, org.benchmark.TestArrayPerf::multArray, version 662 (172 bytes) 2.15% 0x00007f0dbc6cf6c0: add %r10d,%r9d 2.26% 0x00007f0dbc6cf6c3: cmp %esi,%r9d 0x00007f0dbc6cf6c6: jae 0x00007f0dbc6cf826 2.18% 0x00007f0dbc6cf6cc: vmovss 0x10(%r14,%r9,4),%xmm0 6.84% 0x00007f0dbc6cf6d3: cmp %edi,%r9d 0x00007f0dbc6cf6d6: jae 0x00007f0dbc6cf855 2.18% 0x00007f0dbc6cf6dc: vmulss 0x10(%rdx,%r9,4),%xmm0,%xmm0 10.81% 0x00007f0dbc6cf6e3: cmp %ebp,%r9d 0x00007f0dbc6cf6e6: jae 0x00007f0dbc6cf885 2.44% 0x00007f0dbc6cf6ec: vmovss %xmm0,0x10(%rcx,%r9,4) 2.73% 0x00007f0dbc6cf6f3: mov %r9d,%ebx 2.28% 0x00007f0dbc6cf6f6: inc %ebx 1.99% 0x00007f0dbc6cf6f8: add $0x2,%r9d 2.46% 0x00007f0dbc6cf6fc: cmp %esi,%r9d 0x00007f0dbc6cf6ff: jae 0x00007f0dbc6cf8b5 2.44% 0x00007f0dbc6cf705: vmovss 0x10(%r14,%rbx,4),%xmm1 6.53% 0x00007f0dbc6cf70c: cmp %edi,%r9d 0x00007f0dbc6cf70f: jae 0x00007f0dbc6cf8de 2.22% 0x00007f0dbc6cf715: vmulss 0x10(%rdx,%rbx,4),%xmm1,%xmm1 9.96% 0x00007f0dbc6cf71b: cmp %ebp,%r9d 0.00% 0x00007f0dbc6cf71e: jae 0x00007f0dbc6cf90a 2.56% 0x00007f0dbc6cf724: vmovss %xmm1,0x10(%rcx,%rbx,4) 2.28% 0x00007f0dbc6cf72a: vmovss 0x10(%rdx,%r9,4),%xmm1 10.16% 0x00007f0dbc6cf731: vmulss 0x10(%r14,%r9,4),%xmm1,%xmm0 9.73% 0x00007f0dbc6cf738: vmovss %xmm0,0x10(%rcx,%r9,4) 2.50% 0x00007f0dbc6cf73f: mov %r10d,%r9d 2.11% 0x00007f0dbc6cf742: shl %r9d 2.16% 0x00007f0dbc6cf745: add $0x2,%r9d 1.97% 0x00007f0dbc6cf749: inc %r10d 2.33% 0x00007f0dbc6cf74c: cmp %r13d,%r10d 0x00007f0dbc6cf74f: jl 0x00007f0dbc6cf6c0 .................................................................................................... 95.53% <total for region 1>
19-06-2019

Starting from jdk9, C2 performs ConvI2L(AddI(x, y)) => AddL(ConvI2L(x), ConvI2L(y)) transformation for some accesses and it leads to more code in the tight loop. 0x00007ff1e46ce7cf: movslq %ebx,%rbx 0x00007ff1e46ce7d2: movslq %r11d,%r14 0x00007ff1e46ce7d5: add %rbx,%r14
19-06-2019