JDK-8078497 : C2's superword optimization causes unaligned memory accesses
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 8u60,9
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2015-04-23
  • Updated: 2017-08-07
  • Resolved: 2015-05-11
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8 JDK 9
8u60Fixed 9 b66Fixed
Related Reports
Blocks :  
Description
C2's superword optimization generates unaligned memory accesses that cause crashes on systems that require proper address alignment (for example, Sparc):

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0xa) at pc=0xffffffff7694984c, pid=2009, tid=2
#
# JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-fastdebug-tohartma_2015_04_14_10_06-b00)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (1.9.0-internal-fastdebug-tohartma_2015_04_14_10_06-b00 mixed mode solaris-sparc compressed oops)
# Problematic frame:
# J 5 C2 Test.copyByteToChar([B[B[CI)V (249 bytes) @ 0xffffffff7694984c [0xffffffff76949700+0x14c]
#

In this case the failing instruction is:

  0xffffffff7694984c: ldd  [ %g5 + %l3 ], %f8

Trying to load a double word (8 bytes) from byte[] src (see attached 'TestVectorizationWithInvariant.java'). The problem is that the address 0x00000005d57b3a3a (0b...1010) is not double word aligned.


Comments
Fix verified by regression test.
07-08-2017

URL: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/aec198eb37bc User: lana Date: 2015-05-27 18:31:25 +0000
27-05-2015

URL: http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/aec198eb37bc User: thartmann Date: 2015-05-11 07:25:06 +0000
11-05-2015

Detailed evaluation: After loop optimizations, C2 tries to vectorize memory operations by finding and merging adjacent memory accesses in the loop body. The superword optimization matches MemNodes with the following pattern as address input: ptr + k*iv + constant [+ invar] where iv is the loop induction variable, k is a scaling factor that may be 0 and invar is an optional loop invariant value. C2 then picks the memory operation with most similar references and tries to align it in the main loop by setting the number of pre-loop iterations accordingly. Other adjacent memory operations that conform to this alignment are merged into packs. After extending and filtering these packs, final vector operations are emitted. The problem is that some architectures (for example, Sparc) require memory accesses to be aligned and in special cases the superword optimization generates code that contains unaligned vector instructions. Problems: (1) If two memory operations have different loop invariant offset values and C2 decides to align the main loop to one of them, we can always set the invariant of the other operation such that we break the alignment constraint. For example, if we have a store to a byte array a[i+inv1] and a load from a byte array b[i+inv2] (where i is the induction variable), C2 may decide to align the main loop such that (i+inv1 % 8) == 0 and replace the adjacent byte stores by a 8-byte double word store. Also the byte loads will be replaced by a double word load. If we then set inv2 = inv1 + 1 at runtime, the load will fail with a SIGBUS because the access is not 8-byte aligned, i.e., (i+inv2 % 8) != 0. Vectorization should not take place in this case. (2) If C2 decides to align the main loop to a memory operation, the necessary adjustment of the induction variable by the pre-loop is computed such that the resulting offset in the main loop is aligned to the vector width (see 'SuperWord::get_iv_adjustment'). If the offset of a memory operation is independent of the loop induction variable (i.e., scale k is 0), the iv adjustment should be 0. (3) If the loop span is greater than the vector width 'vw' of a memory operation, the superword optimization assumes that the pre-loop is never able to align this operation in the main loop (see 'SuperWord::ref_is_alignable'). This is wrong because if the loop span is a multiple of vw and depending on the initial offset, it may very well be possible to align the operation to vw. These problems originally only showed up in the string density code base (JDK-8054307) where we have a putChar intrinsic that writes a char value to two entries of a byte array. To reproduce the issue with JDK9 we have to make sure that the memory operations: (i) are independent, (ii) have different invariants, (iii) are not too complex (because then vectorization will not take place). To guarantee (i) we either need two different arrays (for example, byte[] and char[]) for the load and store operations or the same array but different offsets. Since the offsets should include a loop invariant part (ii), the superword optimization will not be able to determine that the runtime offset values do not overlap. We therefore use Unsafe.getChar/putChar to read/write a char value from/to a byte/char array and thereby guarantee independence. I came up with the following test (see 'TestVectorizationWithInvariant.java'): byte[] src = new byte[1000]; byte[] dst = new char[1000]; for (int i = (int) CHAR_ARRAY_OFFSET; i < 100; i = i + 8) { // Copy 8 chars from src to dst unsafe.putChar(dst, i + 0, unsafe.getChar(src, off + 0)); [...] unsafe.putChar(dst, i + 14, unsafe.getChar(src, off + 14)); } Problem (1) shows up since the main loop will be aligned to the StoreC[i + 0] which has no invariant. However, the LoadUS[off + 0] has the loop invariant 'off'. Setting off to BYTE_ARRAY_OFFSET + 2 will break the alignment of the emitted double word load and result in a crash. The LoadUS[off + 0] in above example is independent of the loop induction variable and therefore has no scale value. Because of problem (2), the iv adjustment is computed as -4 (see 'SuperWord::get_iv_adjustment'). offset = 16 iv_adjust = -4 elt_size = 2 scale = 0 iv_stride = 16 vect_size 8 The regression test contains additional test cases that also trigger problem (3).
29-04-2015

ILW = crash, simple reproducer available, workaround exists (-XX:-UseSuperWord) = HHL = P2
23-04-2015