JDK-8267652 : c2 loop unrolling by 8 results in reading memory past array
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 11,13,15.0.2,16,17
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: x86
  • Submitted: 2021-05-24
  • Updated: 2025-01-29
  • Resolved: 2021-06-22
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 15 JDK 17 JDK 18
11.0.14-oracleFixed 15.0.5Fixed 17 b28Fixed 18Fixed
Related Reports
Relates :  
Description
I have found a bug in c2 (x86_64/AVX2), which can be reproduced on latest ojdk17.

running java with these options:
-XX:UseAVX=2 -XX:LoopMaxUnroll=8
emits following code

   vmovq  0x10(%r8,%rdi,1),%xmm0  <-  read 8 bytes from byteArray1(r8)
   vpxor  0x10(%r11,%rdi,1),%xmm0,%xmm0 <- read 16 bytes from byteArray2 (r11) and xor them with xmm0
   vmovq  %xmm0,0x10(%r12,%rdi,1)      ;*bastore {reexecute=0 rethrow=0 return_oop=0}  <- write 8 bytes to byteArray3 (r12)
                                                            ; - repro::xor_array@18 (line 10)
   add    $0x8,%ebx                    ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - repro::xor_array@19 (line 10)
   cmp    %esi,%ebx

the problem is vpxor reading 16 bytes, not 8 bytes like vmovq before it.
it may sound like not a big deal, except one case, when there are no mapped memory after byteArray pointed by %r11, then vpxor will try to access unmapped memory and crash with seg fault.

Attaching reproducer which generates such assemly code, making it crash is very hard, as the object has to be located at very end of the region. But I have seen such crash in the wild, a snippet from such hs_err

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x00007f5d7a000000

RBP=0x00007f5d79ffffc8 is an oop: [B 
{0x00007f5d79ffffc8} - klass: {type array byte}
 - length: 32

CompressedOops disabled, so header is 0x18 bytes

0001 movslq %r11d, %r10
0003 vmovq 0x18(%rcx, %r10, 1), %xmm0
000a vpxor   0x18(%rbp, %r10, 1), %xmm0, %xmm0  <- reading 16 bytes result in reading past mapped memory region
0011 vmovq %xmm0, 0x18(%r8, %r19, 1)
0018 add $0x8, %r11d
001c cmp $0x19, %r11d
0020 jl 0x00

Comments
Fix Request (11u): This fixes a p2 issue in jdk11u. the issue is intermittent. Happens when array object located at the very end of a mapped heap region Backport doesn't apply cleanly. tested with full tier regression set.
14-10-2021

Fix Request (15u): This fixes a p2 issue in jdk15. the issue is intermitent. Happens when array object located at the very end of a mapped heap region Backport doesn't apply cleanly. tested with tier1 tests on release and fastdebug builds
20-07-2021

Fix Request (16u): This fixes a p2 issue in jdk16. the issue is intermitent. Happens when array object located at the very end of a mapped heap region Backport applies cleanly
24-06-2021

Changeset: dc12cb78 Author: Nils Eliasson <neliasso@openjdk.org> Date: 2021-06-22 16:21:35 +0000 URL: https://git.openjdk.java.net/jdk17/commit/dc12cb78b81f56e9d4b282cf7cad5faa9a9886bf
22-06-2021

For SSE we don't generate op instructions with mem as source due to alignment requirement, so only AVX rules need to be fixed.
18-06-2021

It is not just AVX, even for SSE minimum is 16 bytes. The x86.ad needs to be fixed with check for vector length in bytes to be >= 16 for the mem rules.
18-06-2021

[~sviswanathan] an [~jbhateja] please, look on this issue and give us advice. Main question: can we have a variant for vxor_mem() when AVX > 0 but vector length <= 8 bytes. It is really unfortunate if we have to replace one instruction with 2 by separating load from memory.
18-06-2021

Well, it will fix the case when we use XOR operation, but other ops, like OR/AND/NOT and many other possible cases may not be fixed. Tobias's fix sounds more universal.
02-06-2021

Since this only applies to 8 byte vectors, and that isn't a default, a workaround that uses an extra register is ok.
02-06-2021

Or make sure that the vectorized main loop does not access the last element(s) of the array and leave them to the (unvectorized) post loop.
01-06-2021

as vpxor works only with 128-bit values ( or 16 bytes), the only way seems to be loading elements from byteArray2 into another xmm register (e.g. xmm1) and then use vpxor on register, e.g. vpxor xmm1, xmm0, xmm0. ( xmm0 = xmm0 ^ xmm1)
31-05-2021

ILW = Crash due to reading memory past array bounds, intermittent with C2 compiled code, disable loop unrolling / vectorization = HMM = P2
26-05-2021