JDK-8208163 : [lworld] -XX:+EnableValhalla causes ~2x times performance regression on String(char[]) constructor.
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: repo-valhalla
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • OS: generic
  • CPU: generic
  • Submitted: 2018-07-24
  • Updated: 2020-06-30
  • Resolved: 2020-06-30
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
repo-valhallaResolved
Related Reports
Duplicate :  
Description
Valhalla causes ~2x times performance regression on String(char[]) constructor (regression depends on string length).  
Some other large benchmarks have shown 2%-10% performance regression caused by this.

Comments
The above check was introduced by this changeset http://hg.openjdk.java.net/valhalla/valhalla/diff/7feb5ae0f3dc/src/hotspot/share/opto/graphKit.cpp 8205340: [lworld] Re-enable _hashCode, _identityHashCode and _newArray intrinsics author roland date Thu, 28 Jun 2018 16:50:42 +0200 (6 weeks ago)
10-08-2018

Proposed fix: http://cr.openjdk.java.net/~iklam/valhalla/8208163_string_charary_ctor_slowdown.v01/ I did a couple of runs of Sergey's StringMicros3 and find no difference when EnableValhalla is enabled: -XX:-EnableValhalla: 88.030 ��(99.9%) 7.767 ns/op [Average] -XX:+EnableValhalla: 80.675 ��(99.9%) 2.771 ns/op [Average] -XX:-EnableValhalla: 92.785 ��(99.9%) 10.219 ns/op [Average] -XX:+EnableValhalla: 91.391 ��(99.9%) 9.824 ns/op [Average]
10-08-2018

I wrote a simplified test case public class Hello { public String getString() { return new String(carray); } private static final char carray[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'}; public static void main(String args[]) throws Throwable { new String(carray); (new Hello()).getString(); } } --------------------- When getString is compiled, PhaseMacroExpand::expand_allocate_array() is called: For -XX:-EnableValhalla init->is_complete_with_arraycopy() is true and OptoRuntime::new_array_nozero_Java() is used. For -XX:+EnableValhalla init->is_complete_with_arraycopy() is false and OptoRuntime::new_array_Java() is used. This matches with what Sergey observed in the above comment. --------------------- The above is caused by LibraryCallKit::inline_string_copy() { .... if (alloc != NULL) { if (alloc->maybe_set_complete(&_gvn)) { // returns false when EnableValhalla --------------------- The above is caused by the following code, where elem_klass == NULL: Node GraphKit::new_array(...) { ... } else if (EnableValhalla && (!layout_con || elem_klass == NULL || (elem_klass->is_java_lang_Object() && !ary_type->klass_is_exact()))) { InitializeNode* init = alloc->initialization(); init->set_unknown_value(); << -- this is called }
09-08-2018

The source of regression is zeroing of newly allocated string array: =================== 0.08% 0x00007f83f0cbe7ac: prefetchw 0xc0(%rdi) 0.99% 0x00007f83f0cbe7b3: movq $0x1,(%r8) 0.01% 0x00007f83f0cbe7ba: prefetchw 0x100(%rdi) 0.33% 0x00007f83f0cbe7c1: movl $0x800,0x8(%r8) ; {metadata({type array byte})} 0.05% 0x00007f83f0cbe7c9: mov %r11d,0xc(%r8) 0.10% 0x00007f83f0cbe7cd: prefetchw 0x140(%rdi) 0.11% 0x00007f83f0cbe7d4: prefetchw 0x180(%rdi) 0.13% 0x00007f83f0cbe7db: shr $0x3,%rcx 0.06% 0x00007f83f0cbe7df: add $0xfffffffffffffffe,%rcx 0.10% 0x00007f83f0cbe7e3: mov %r8,%rdi 0.02% 0x00007f83f0cbe7e6: add $0x10,%rdi 0.04% 0x00007f83f0cbe7ea: xor %rax,%rax 0.04% 0x00007f83f0cbe7ed: cmp $0x8,%rcx ��� 0x00007f83f0cbe7f1: jg 0x00007f83f0cbe803 ��� 0x00007f83f0cbe7f3: dec %rcx ������ 0x00007f83f0cbe7f6: js 0x00007f83f0cbe80a ��������� 0x00007f83f0cbe7f8: mov %rax,(%rdi,%rcx,8) ��������� 0x00007f83f0cbe7fc: dec %rcx ��������� 0x00007f83f0cbe7ff: jge 0x00007f83f0cbe7f8 ������ ��� 0x00007f83f0cbe801: jmp 0x00007f83f0cbe80a 0.06% ������ ��� 0x00007f83f0cbe803: shl $0x3,%rcx 12.40% ��� ��� 0x00007f83f0cbe807: rep rex.W stos %al,%es:(%rdi) 0.99% ��� ��� 0x00007f83f0cbe80a: lea 0x10(%r12,%r10,8),%rsi =================== Zeroing code doesn't generated in non-valhalla version.
24-07-2018

How to reproduce, use JMH benchmark ===================================== import org.openjdk.jmh.annotations.*; import java.util.Random; import java.util.concurrent.TimeUnit; @State(Scope.Thread) @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) public class StringMicros3 { private char[] carray; @Param({"500"}) int size; @Setup public void setup() { byte[] barray = new byte[size/2]; new Random(4711).nextBytes(barray); carray = getChars(barray); } @Benchmark public String getString() { return new String(carray); } private static final char DIGITS[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'}; private static char[] getChars(byte[] barray) { char[] res = new char[barray.length * 2]; int j = 0; for (int i = 0; i < barray.length; i++) { res[j++] = DIGITS[(barray[i] & 0xF0) >>> 4]; res[j++] = DIGITS[barray[i] & 0x0F]; } return res; } } ===================================== baseline options: -XX:-EnableValhalla -XX:-TieredCompilation lworld options: -XX:-EnableValhalla
24-07-2018