Bug ID: JDK-8224234 compiler/codegen/TestCharVect2.java fails in test

JDK 11	JDK 13	JDK 14
11.0.9Fixed	13 b25Fixed	14Fixed

Fix Request (11u). Simple reliability patch, adds USE info to Intel vector instruction code patterns. Applies cleanly. Review approval: https://mail.openjdk.java.net/pipermail/jdk-updates-dev/2020-June/003340.html This is the second in a sequence of backports that include JDK-8222074: Enhance auto vectorization for x86 JDK-8224234: compiler/codegen/TestCharVect2.java fails in test_mulc JDK-8226721: Missing intrinsics for Math.ceil, floor, rint JDK-8231713: x86_32 build failures after JDK-8226721 (Missing intrinsics for Math.ceil, floor, rint) JDK-8230591: AArch64: Missing intrinsics for Math.ceil, floor, rint The patch for JDK-8222074 and this backport should be applied at the same time to avoid test failure.
21-07-2020
Patch: http://cr.openjdk.java.net/~sviswanathan/8224234/ Multiplication is implemented as shift, add sequence in certain cases. The problem was occurring because the ��shift by�� register was being overwritten due to missing effect statement in the shift rules. The problem was introduced as part of https://bugs.openjdk.java.net/browse/JDK-8222074. The patch adds the missing effect statements.
07-06-2019
Spotted on linux-x64 as well, using a command line along the lines of: -Xcomp -XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation -XX:+IgnoreUnrecognizedVMOptions -XX:-DoEscapeAnalysis
05-06-2019
If I decode it all correctly the command line would be something like: -XX:MaxRAMPercentage=4 -Xcomp -XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation -XX:+IgnoreUnrecognizedVMOptions -XX:-DoEscapeAnalysis Note thought that it's intermittent and seems to reproduce very infrequently. [~vlivanov] mentioned that he was able to reproduce it in an earlier comment, so he may be able to help provide a "better" reproducer.
04-06-2019
Hi Mikael, Could you please provide the command line to produce it on windows
04-06-2019
The same failure was spotted on Windows, so I cleared the OS field.
04-06-2019
Unless the fix is right around the corner I'm going to problem list this.
04-06-2019
I was able to reproduce the failure with the following flags (full log attached - test_mulc.log): -Xcomp -Xbatch -Xmx128m -XX:MaxVectorSize=8 -ea -esa -XX:CompileThreshold=100 -XX:-TieredCompilation -XX:-UseCompressedOops -XX:UseAVX=0 -Xbatch test_mulc code: $ -XX:CompileCommand=dontinline,::test_mulc -XX:CompileCommand=print,::test_mulc compiler/codegen/TestCharVect2.test_mulc([C[C)V [0x000000010abc58a0, 0x000000010abc59f8] 344 bytes [Constants] # {method} {0x00000001109c4908} 'test_mulc' '([C[C)V' in 'compiler/codegen/TestCharVect2' # parm0: rsi:rsi = '[C' # parm1: rdx:rdx = '[C' # [sp+0x30] (sp of caller) ;; N1: # B1 <- B19 B18 B17 Freq: 1 ;; B1: # B19 B2 <- BLOCK HEAD IS JUNK Freq: 1 0x000000010abc58a0: mov %eax,-0x16000(%rsp) 0x000000010abc58a7: push %rbp 0x000000010abc58a8: sub $0x20,%rsp ;synchronization entry ; - compiler.codegen.TestCharVect2::test_mulc@-1 (line 1014) 0x000000010abc58ac: mov 0x10(%rsi),%ebp ;arraylength {reexecute=0 rethrow=0 return_oop=0} ; - compiler.codegen.TestCharVect2::test_mulc@4 (line 1014) ; implicit exception: dispatches to 0x000000010abc59c6 ;; B2: # B17 B3 <- B1 Freq: 0.999999 0x000000010abc58af: test %ebp,%ebp 0x000000010abc58b1: jbe 0x000000010abc59a0 ;if_icmpge {reexecute=0 rethrow=0 return_oop=0} ; - compiler.codegen.TestCharVect2::test_mulc@5 (line 1014) ;; B3: # B18 B4 <- B2 Freq: 0.499999 0x000000010abc58b7: mov 0x10(%rdx),%r10d ;caload {reexecute=0 rethrow=0 return_oop=0} ; - compiler.codegen.TestCharVect2::test_mulc@12 (line 1015) ; implicit exception: dispatches to 0x000000010abc59b0 ;; B4: # B18 B5 <- B3 Freq: 0.499999 0x000000010abc58bb: test %r10d,%r10d 0x000000010abc58be: jbe 0x000000010abc59b0 ;; B5: # B18 B6 <- B4 Freq: 0.499998 0x000000010abc58c4: mov %ebp,%r11d 0x000000010abc58c7: dec %r11d 0x000000010abc58ca: cmp %r10d,%r11d 0x000000010abc58cd: jae 0x000000010abc59b0 ;; B6: # B18 B7 <- B5 Freq: 0.499998 0x000000010abc58d3: cmp %ebp,%r11d 0x000000010abc58d6: jae 0x000000010abc59b0 ;; B7: # B8 <- B6 Freq: 0.499997 0x000000010abc58dc: mov $0x4,%r10d 0x000000010abc58e2: cmp %r10d,%ebp 0x000000010abc58e5: mov %ebp,%r8d 0x000000010abc58e8: cmovg %r10d,%r8d 0x000000010abc58ec: xor %ecx,%ecx 0x000000010abc58ee: nop 0x000000010abc58ef: nop ;aload_0 {reexecute=0 rethrow=0 return_oop=0} ; - compiler.codegen.TestCharVect2::test_mulc@8 (line 1015) ;; B8: # B8 B9 <- B7 B8 Loop: B8-B8 inner pre of N204 Freq: 4.99997 0x000000010abc58f0: movzwl 0x18(%rdx,%rcx,2),%r10d ;caload {reexecute=0 rethrow=0 return_oop=0} ; - compiler.codegen.TestCharVect2::test_mulc@12 (line 1015) 0x000000010abc58f6: mov %r10d,%r12d 0x000000010abc58f9: shl $0x3,%r12d 0x000000010abc58fd: sub %r10d,%r12d 0x000000010abc5900: mov %r12w,0x18(%rsi,%rcx,2) ;castore {reexecute=0 rethrow=0 return_oop=0} ; - compiler.codegen.TestCharVect2::test_mulc@17 (line 1015) 0x000000010abc5906: inc %ecx ;iinc {reexecute=0 rethrow=0 return_oop=0} ; - compiler.codegen.TestCharVect2::test_mulc@18 (line 1014) 0x000000010abc5908: cmp %r8d,%ecx 0x000000010abc590b: jl 0x000000010abc58f0 ;if_icmpge {reexecute=0 rethrow=0 return_oop=0} ; - compiler.codegen.TestCharVect2::test_mulc@5 (line 1014) ;; B9: # B14 B10 <- B8 Freq: 0.499997 0x000000010abc590d: mov %ebp,%r9d 0x000000010abc5910: add $0xfffffffd,%r9d 0x000000010abc5914: mov $0x80000000,%r10d 0x000000010abc591a: cmp %r9d,%r11d 0x000000010abc591d: cmovl %r10d,%r9d 0x000000010abc5921: cmp %r9d,%ecx 0x000000010abc5924: jge 0x000000010abc597f ;; B10: # B11 <- B9 Freq: 0.499997 0x000000010abc5926: mov $0xfa0,%r10d ;goto {reexecute=0 rethrow=0 return_oop=0} ; - compiler.codegen.TestCharVect2::test_mulc@21 (line 1014) 0x000000010abc592c: mov $0x3,%r8d 0x000000010abc5932: movd %r8d,%xmm0 ;; B11: # B12 <- B10 B13 Loop: B11-B13 Freq: 4.99997 0x000000010abc5937: mov %r9d,%r8d 0x000000010abc593a: sub %ecx,%r8d 0x000000010abc593d: cmp %r10d,%r8d 0x000000010abc5940: cmovg %r10d,%r8d 0x000000010abc5944: add %ecx,%r8d 0x000000010abc5947: nop 0x000000010abc5948: nop 0x000000010abc5949: nop 0x000000010abc594a: nop 0x000000010abc594b: nop 0x000000010abc594c: nop 0x000000010abc594d: nop 0x000000010abc594e: nop 0x000000010abc594f: nop ;aload_0 {reexecute=0 rethrow=0 return_oop=0} ; - compiler.codegen.TestCharVect2::test_mulc@8 (line 1015) ;; B12: # B12 B13 <- B11 B12 Loop: B12-B12 inner main of N101 strip mined Freq: 49.9997 0x000000010abc5950: movq 0x18(%rdx,%rcx,2),%xmm1 ;caload {reexecute=0 rethrow=0 return_oop=0} ; - compiler.codegen.TestCharVect2::test_mulc@12 (line 1015) 0x000000010abc5956: movapd %xmm1,%xmm2 0x000000010abc595a: psllw %xmm0,%xmm2 0x000000010abc595e: psubw %xmm1,%xmm2 0x000000010abc5962: movq %xmm2,0x18(%rsi,%rcx,2) ;castore {reexecute=0 rethrow=0 return_oop=0} ; - compiler.codegen.TestCharVect2::test_mulc@17 (line 1015) 0x000000010abc5968: add $0x4,%ecx ;iinc {reexecute=0 rethrow=0 return_oop=0} ; - compiler.codegen.TestCharVect2::test_mulc@18 (line 1014) 0x000000010abc596b: cmp %r8d,%ecx 0x000000010abc596e: jl 0x000000010abc5950 ;goto {reexecute=0 rethrow=0 return_oop=0} ; - compiler.codegen.TestCharVect2::test_mulc@21 (line 1014) ;; B13: # B11 B14 <- B12 Freq: 4.99997 0x000000010abc5970: mov 0x128(%r15),%r11 ; ImmutableOopMap{rdx=Oop rsi=Oop } ;goto {reexecute=1 rethrow=0 return_oop=0} ; - compiler.codegen.TestCharVect2::test_mulc@21 (line 1014) 0x000000010abc5977: test %eax,(%r11) ;goto {reexecute=0 rethrow=0 return_oop=0} ; - compiler.codegen.TestCharVect2::test_mulc@21 (line 1014) ; {poll} 0x000000010abc597a: cmp %r9d,%ecx 0x000000010abc597d: jl 0x000000010abc5937 ;; B14: # B17 B15 <- B9 B13 Freq: 0.499997 0x000000010abc597f: cmp %ebp,%ecx 0x000000010abc5981: jge 0x000000010abc59a0 ;; B15: # B16 <- B14 Freq: 0.249999 0x000000010abc5983: nop ;aload_0 {reexecute=0 rethrow=0 return_oop=0} ; - compiler.codegen.TestCharVect2::test_mulc@8 (line 1015) ;; B16: # B16 B17 <- B15 B16 Loop: B16-B16 inner post of N204 Freq: 2.49999 0x000000010abc5984: movzwl 0x18(%rdx,%rcx,2),%r10d ;caload {reexecute=0 rethrow=0 return_oop=0} ; - compiler.codegen.TestCharVect2::test_mulc@12 (line 1015) 0x000000010abc598a: mov %r10d,%r8d 0x000000010abc598d: shl $0x3,%r8d 0x000000010abc5991: sub %r10d,%r8d 0x000000010abc5994: mov %r8w,0x18(%rsi,%rcx,2) ;castore {reexecute=0 rethrow=0 return_oop=0} ; - compiler.codegen.TestCharVect2::test_mulc@17 (line 1015) 0x000000010abc599a: inc %ecx ;iinc {reexecute=0 rethrow=0 return_oop=0} ; - compiler.codegen.TestCharVect2::test_mulc@18 (line 1014) 0x000000010abc599c: cmp %ebp,%ecx 0x000000010abc599e: jl 0x000000010abc5984 ;if_icmpge {reexecute=0 rethrow=0 return_oop=0} ; - compiler.codegen.TestCharVect2::test_mulc@5 (line 1014) ;; B17: # N1 <- B16 B14 B2 Freq: 0.999997 0x000000010abc59a0: add $0x20,%rsp 0x000000010abc59a4: pop %rbp 0x000000010abc59a5: mov 0x128(%r15),%r10 0x000000010abc59ac: test %eax,(%r10) ; {poll_return} 0x000000010abc59af: retq
23-05-2019
The reason why I focus on shift count and OSR in the comment above is given below. Analyzing the issue based on the problem description: e.g. the failure in TestCharVect2.java: static void test_mulc(char[] a0, char[] a1) { for (int i = 0; i < a0.length; i+=1) { a0[i] = (char)(a1[i]*VALUE); } } Where: VALUE = 7; a1[i] = (char)(ADD_INIT+i); // ADD_INIT = Character.MAX_VALUE-500 = 65535-500 = 65035 Multiplying a value, say Y by 7, is implemented by the compiler as Y << 3 �� Y. Per the bug report, the test failure looks as below: test_mulc: [4] = 497 != 62057 test_mulc: [5] = 496 != 62064 test_mulc: [6] = 495 != 62071 ... a1[i] index 4 is 65039 (0xFE0F), mutiply by is 7 so the result should be: (char) 0x6F269 = 0xF269 = 62057. Doing this by shift and subtract should have been: Y << 3: (char) (0xFE0F << 3) = 0xF078 (Y << 3) �� Y: (char) 0xF078 �� 0xFE0F = 0xF269 = 62057. What we get per the bug report is: 497 Reverse calculating, we can only get this when Y << 3 is 0 as (Y << 3) �� Y = 0 �� 0xFE0F = 0x01F1 = 497 So what must be happening is that Y is shifted left by a value larger than 15 instead of shift by 3. Shift is implemented with the following instructions: Load shift count into an xmm register: 0x00007f6fe42bf1ff: mov $0x3,%r9d 0x00007f6fe42bf205: movd %r9d,%xmm0 Perform actual shift: 0x00007f6fe42bf350: movdqu 0x10(%rax,%rbx,2),%xmm2 0x00007f6fe42bf356: movdqu %xmm2,%xmm4 0x00007f6fe42bf35a: psllw %xmm0,%xmm4 0x00007f6fe42bf35e: psubw %xmm2,%xmm4 0x00007f6fe42bf362: movdqu %xmm4,0x10(%r11,%rbx,2) .... The instructions generated looks correct. For shifted value to become zero, it looks as if the shift count (xmm0) has somehow got corrupted between definition and use. The shift count code is generated by the following snippet in the x86.ad file: instruct vshiftcntimm(vecS dst, immI8 cnt, rRegI tmp) %{ match(Set dst cnt); effect(TEMP tmp); format %{ "movl $tmp,$cnt\t" "movdl $dst,$tmp\t! load shift count" %} ins_encode %{ __ movl($tmp$$Register, $cnt$$constant); __ movdl($dst$$XMMRegister, $tmp$$Register); %} ins_pipe( pipe_slow ); %} The vector shift is implemented by the following code snippet: instruct vshift8S(vecX dst, vecX src, vecS shift) %{ predicate(n->as_Vector()->length() == 8); match(Set dst (LShiftVS src shift)); match(Set dst (RShiftVS src shift)); match(Set dst (URShiftVS src shift)); format %{ "vshiftw $dst,$src,$shift\t! shift packed8S" %} ins_encode %{ int opcode = this->as_Mach()->ideal_Opcode(); if (UseAVX == 0) { if ($dst$$XMMRegister != $src$$XMMRegister) __ movdqu($dst$$XMMRegister, $src$$XMMRegister); __ vshiftw(opcode, $dst$$XMMRegister, $shift$$XMMRegister); } else { int vector_len = 0; __ vshiftw(opcode, $dst$$XMMRegister, $src$$XMMRegister, $shift$$XMMRegister, vector_len); } %} ins_pipe( pipe_slow ); %} The other thing to note is that the problem is reported starting somewhere in the middle e.g. starting at index 4 and not at index 0 which can happen in the OSR scenario. The only change in the generated code with and without patch is in how the shift count is passed. Prior to 8222074, the shift count would have been encoded as immediate to the shift instruction and now the shift count is in register.
22-05-2019
Is it possible to try without OSR compilation (I think -XX:-CICompileOSR) and see if the failure still happens? I wonder if the OSR compilation is happening in the middle of test_mulc loop and somehow the shift count is not loaded into xmm register before transferring control to JITTED code. I am not familiar on how OSR loads the values into appropriate registers. Also this is all a wild guess without being able to reproduce the bug. If that is the case then we will need to reintroduce the instruct rules in x86.ad with Immediate as shift count.
22-05-2019
FWIW it seems like the failure is intermittent, and only fails something like ~4 out of 200 times. It looks like it only happens on macOS, and it does not seem to be tied to a specific machine.
22-05-2019
I am unable to reproduce the bug and will need some help. Assembly or Opto assembly for TestVectChar2.test_mulc and TestVectChar2.test methods on the failing Mac OS system would help in pin pointing the issue.
22-05-2019
The latest big changes to vectorization was JDK-8222074: Enhance auto vectorization for x86 Based on history of failure it appeared about the same time. [~sviswanathan], please look and re-assign this bug.
21-05-2019
Failure happened on 2 MacPro machines without AVX: Intel Xeon E5620 2.40GHz: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1 SSE4.2 POPCNT AES PCID SYSCALL XD 1GBPAGE EM64T LAHF RDTSCP TSCI Intel Xeon W3565 3.20GHz: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 DTES64 MON DSCPL VMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1 SSE4.2 POPCNT SYSCALL XD EM64T LAHF RDTSCP TSCI
21-05-2019
compiler/c2/cr6340864/TestLongVect.java seems to be failing in much the same way: test_mulc: [2] = -9223372036854775309 != 9223372036854760339 test_mulc: [3] = -9223372036854775310 != 9223372036854760370 test_mulc: [6] = -9223372036854775313 != 9223372036854760463 test_mulc: [7] = -9223372036854775314 != 9223372036854760494 test_mulc: [16] = -9223372036854775323 != 9223372036854760773 test_mulc: [17] = -9223372036854775324 != 9223372036854760804 test_mulc: [18] = -9223372036854775325 != 9223372036854760835 test_mulc: [19] = -9223372036854775326 != 9223372036854760866 test_mulc: [22] = -9223372036854775329 != 9223372036854760959 test_mulc: [23] = -9223372036854775330 != 9223372036854760990 test_mulc: [32] = -9223372036854775339 != 9223372036854761269 test_mulc: [33] = -9223372036854775340 != 9223372036854761300 ...
20-05-2019