Bug ID: JDK-8278020 ~13% variation in Renaissance-Scrabble

Type: Bug
Component: hotspot
Sub-Component: runtime
Affected Version: 17,18

Priority: P3
Status: Resolved
Resolution: Fixed
OS: linux
CPU: x86

Submitted: 2021-11-30
Updated: 2022-01-25
Resolved: 2022-01-06

JDK 17	JDK 18	JDK 19
17.0.3-oracleFixed	18Fixed	19 b03Fixed

There is a variability of ~13% on x64 and 49% on ARM week to week with Renaissance-Scrabble. This variability is also seen in sequential CI pipeline builds, then it will flip again in a few builds.

Running with -Xshare:off makes the high scores and the low scores more or less meet in the middle.

I haven't yet found the earliest time where this started to happen.

Changeset: 4ba980ba Author: Ioi Lam <iklam@openjdk.org> Date: 2021-12-15 20:06:56 +0000 URL: https://git.openjdk.java.net/jdk/commit/4ba980ba439f94a6b5015e64382a6c308476d63f
19-01-2022
Changeset: 967ef0c4 Author: Ioi Lam <iklam@openjdk.org> Date: 2022-01-07 05:30:20 +0000 URL: https://git.openjdk.java.net/jdk/commit/967ef0c48252957f9bec42965fe02414fd2c77cb
10-01-2022
Fix Request (17u) This resolves the recent regression in 17u. Applies cleanly, passes tests.
07-01-2022
A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk17u-dev/pull/64 Date: 2022-01-07 11:06:41 +0000
07-01-2022
A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk18/pull/87 Date: 2022-01-06 23:57:06 +0000
07-01-2022
[~shade] I've fixed the fix versions according to https://openjdk.java.net/guide/index.html#how-to-fix-an-incorrect-backport-creation-in-jbs Now this (main) bug is marked as Resolved with FixVersion = 19. I will attempt to do a backport to JDK 18 using the backport issue (JDK-8278867)
06-01-2022
Changeset: 4ba980ba Author: Ioi Lam <iklam@openjdk.org> Date: 2021-12-15 20:06:56 +0000 URL: https://git.openjdk.java.net/jdk/commit/4ba980ba439f94a6b5015e64382a6c308476d63f
06-01-2022
Fix was pushed while main bug was targeted to 18. Reset the main bug to fixed in 19 and copied the Robo Duke entry here.
06-01-2022
I think the commit went to JDK 19, so this issue was not automatically closed. Ioi, wouldn't you like to fix it in JDK 18 as well (thus closing this bug)?
20-12-2021
[~ecaspole] found that in the slow cases, a lot of time is reported by JMH perfasm at the following instruction (with https://github.com/jvm-profiling-tools/async-profiler) Column 1: cycles (125424 events) Column 2: l1d_pend_miss.pending_cycles (56716 events) Column 3: CYCLE_ACTIVITY.CYCLES_L2_MISS (66170 events) 0.08% 0.02% 0.03% 0x00007f488cda2dc8: mov 0x10(%r10),%r11d 12.26% 16.97% 16.23% 0x00007f488cda2dcc: lea 0x1b8(%r10,%r11,8),%r11 <<<<<<<<< HERE The "lea" is just an addition of the various operands: r11 = r10 + r11 * 8 + 0x1b8. The memory stall is probably from the previous load: r11 = ((Klass)r10)->_vtable_len. Since the _table_len is only offset 24 from the beginning of the Klass, it may be subject to false sharing. We could swap Klass::_vtable_len with Klass::_modifier_flags, which is at offset 164. Decoding VtableStub itbl[0]@12 -------------------------------------------------------------------------------- 0x00007f0314b2e780: mov 0x8(%rax),%rbx 0x00007f0314b2e784: mov (%rax),%rax 0x00007f0314b2e787: mov 0x8(%rsi),%r10d 0x00007f0314b2e78b: movabs $0x800000000,%r11 0x00007f0314b2e795: add %r11,%r10 0x00007f0314b2e798: mov 0x10(%r10),%r11d // r11 = ((Klass)r10)->_vtable_len 0x00007f0314b2e79c: lea 0x1b8(%r10,%r11,8),%r11 <<<<<<<<< HERE 0x00007f0314b2e7a4: mov (%r11),%r10 0x00007f0314b2e7a7: cmp %r10,%rbx 0x00007f0314b2e7aa: je 0x00007f0314b2e7c1 0x00007f0314b2e7ac: test %r10,%r10 0x00007f0314b2e7af: je 0x00007f0314b2e809 0x00007f0314b2e7b5: add $0x10,%r11 0x00007f0314b2e7b9: mov (%r11),%r10 0x00007f0314b2e7bc: cmp %r10,%rbx 0x00007f0314b2e7bf: jne 0x00007f0314b2e7ac 0x00007f0314b2e7c1: mov 0x8(%rsi),%r10d 0x00007f0314b2e7c5: movabs $0x800000000,%r11 0x00007f0314b2e7cf: add %r11,%r10 0x00007f0314b2e7d2: mov 0x10(%r10),%r11d 0x00007f0314b2e7d6: lea 0x1b8(%r10,%r11,8),%r11 0x00007f0314b2e7de: lea (%r10),%r10 0x00007f0314b2e7e1: mov (%r11),%rbx 0x00007f0314b2e7e4: cmp %rbx,%rax 0x00007f0314b2e7e7: je 0x00007f0314b2e7fe 0x00007f0314b2e7e9: test %rbx,%rbx 0x00007f0314b2e7ec: je 0x00007f0314b2e809 0x00007f0314b2e7f2: add $0x10,%r11 0x00007f0314b2e7f6: mov (%r11),%rbx 0x00007f0314b2e7f9: cmp %rbx,%rax 0x00007f0314b2e7fc: jne 0x00007f0314b2e7e9 0x00007f0314b2e7fe: mov 0x8(%r11),%r11d 0x00007f0314b2e802: mov (%r10,%r11,1),%rbx 0x00007f0314b2e806: jmpq *0x40(%rbx) 0x00007f0314b2e809: jmpq 0x00007f03149e5180
15-12-2021
ILW = MHM = P3
07-12-2021
The _vtable_len field was moved by JDK-8238048. It's now at offset 24 in Klass. Modification of another object that sits just below the Klass may cause false sharing of the cache line and result in dcache misses when loading the Klass::_vtable_len field. Moving Klass::_vtable_len to a higher offset should fix the false sharing issue. This seems to be harmless. [~ecaspole] ran all benchmarks and did not observe any negative side effects.
06-12-2021
Another possibility is that the compiler fails to devirtualize some critical invokeinterface instructions, so we end up excessively calling the VtableStub.
30-11-2021