JDK-8044082 : Efficient array comparison intrinsics
  • Type: JEP
  • Component: core-libs
  • Priority: P3
  • Status: Draft
  • Resolution: Unresolved
  • Submitted: 2014-05-27
  • Updated: 2016-03-04
Related Reports
Blocks :  
Blocks :  
Relates :  
Relates :  
Description
___(rough draft; needs formatting; see <http://cr.openjdk.java.net/~mr/jep/jep-2.0-02.html>)___

There are efficient operations (both safe and unsafe) for copying arrays, but none for comparing them.  Let's fix this so users don't have to roll their own.

At a minimum, the following intrinsic would allow suitable higher-level comparison operations to be implemented:

    /** Find the index of a mismatch between src and dest, or -1 if none. */
    System.arrayMismatch((Object src, int srcPos,
        Object dest, int destPos, int length, boolean fromEnd);


* Returns an index in src of an element which differs from the corresponding element in dest, if such an element exists.
* If no such element exists, returns `-1`.
* If fromEnd is true, returns the index of the last differing element, else returns the first.
* Correspondences between elements are as for System.arraycopy.
* The result will be either `-1` or a value between `srcPos` and `srcPos+length-1`, inclusive.
* Exceptions are as for `System.arraycopy`.
* The arrays must have compatible element types, either both the
same primitive type or both reference types.
* Comparisons are done as if by the Java `==` operator.
* In particular references to different objects differ, even if `Object.equals` would return true.
* Also, floating point `NaN`s always differ, even from themselves.

This mechanism is easy to code and easy to vectorize.
Most higher-level array comparison operations can readily be built on top of it.

Basic comparison operations (at least lexicographic `compareTo`) using it should be added to `java.util.Arrays`.

`String.compareTo` can be re-implemented on top of this intrinsic.
Methods of the form `String.mismatchTo` should be added.

As a possible follow-up, `System.arrayIndexOf` could be added to perform a vectorized non-anchored search of one array inside another.  This loop is less interesting, however, since it would be too narrow for some use cases involving `NaN`s, structural object comparisons, or case-insensitive searches.
Comments
In progress webrevs to be applied in order: http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8033148-Arrays-lexico-compare/webrev/ http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8136924-arrays-mismatch-vectorized-unsafe/webrev/
22-09-2015

Here are some simple implementations to further investigate: static int vectoredLongMismatchUnaligned(Object a64, long aOffset, Object b64, long bOffset, int words) { int i = 0; for (; i < words; i++) { // @@@ Update to using getLongUnaligned if (U.getLong(a64, aOffset + (i << LONG_ASHIFT)) != U.getLong(b64, bOffset + (i << LONG_ASHIFT))) break; } return i; } static int fastMismatchByte(byte[] a, byte[] b, int length) { int wi = vectoredLongMismatchUnaligned( a, Unsafe.ARRAY_BYTE_BASE_OFFSET, b, Unsafe.ARRAY_BYTE_BASE_OFFSET, length / Long.BYTES); for (int i = wi * Long.BYTES; i < length; i++) { if (a[i] != b[i]) return i; } return -1; } static int fastMismatchByteRange(byte[] a, int aFromIndex, byte[] b, int bFromIndex, int length) { int wi = vectoredLongMismatchUnaligned( a, Unsafe.ARRAY_BYTE_BASE_OFFSET + aFromIndex, b, Unsafe.ARRAY_BYTE_BASE_OFFSET + bFromIndex, length / Long.BYTES); for (int i = wi * Long.BYTES; i < length; i++) { if (a[aFromIndex + i] != b[bFromIndex + i]) return aFromIndex + i; } return -1; } static int fastMismatchCharRange(char[] a, int aFromIndex, char[] b, int bFromIndex, int length) { int wi = vectoredLongMismatchUnaligned( a, Unsafe.ARRAY_CHAR_BASE_OFFSET + aFromIndex * Character.BYTES, b, Unsafe.ARRAY_CHAR_BASE_OFFSET + bFromIndex * Character.BYTES, length * Character.BYTES / Long.BYTES); for (int i = wi * Long.BYTES / Character.BYTES; i < length; i++) { if (a[aFromIndex + i] != b[bFromIndex + i]) return aFromIndex + i; } return -1; } This assumes there will be unaligned access support. There is no explicit loop unrolling to see how hotspot does. For say: byte[] a = new byte[1024 * 64]; byte[] b = new byte[1024 * 64]; b[1024 * 64 - 2] = 1; for (int i = 0; i < 100000; i++) { s += fastMismatchByteRange(a, 0, b, 0, b.length); s += fastMismatchByte(a, b, b.length); } When base offsets are not constant, as with fastMismatchByteRange, generated unrolled loop blocks are like: 0x0000000108727a7a: mov %edx,%esi 0x0000000108727a7c: shl $0x3,%esi 0x0000000108727a7f: mov %esi,%ebx 0x0000000108727a81: add $0x8,%ebx ;*ishl ; - arrays.ArrayMismatch::longMismatchUnaligned@15 (line 75) ; - arrays.ArrayMismatch::fastMismatchByteRange@19 (line 103) 0x0000000108727a84: movslq %ebx,%rax ;*i2l ; - arrays.ArrayMismatch::longMismatchUnaligned@16 (line 75) ; - arrays.ArrayMismatch::fastMismatchByteRange@19 (line 103) 0x0000000108727a87: mov %rax,%r13 0x0000000108727a8a: add %rdi,%r13 0x0000000108727a8d: mov (%r14,%r13,1),%r13 0x0000000108727a91: add %r11,%rax 0x0000000108727a94: cmp (%rcx,%rax,1),%r13 0x0000000108727a98: jne 0x0000000108727b6e ;*ifeq ; - arrays.ArrayMismatch::longMismatchUnaligned@43 (line 77) ; - arrays.ArrayMismatch::fastMismatchByteRange@19 (line 103) When the base offsets are constant, as with fastMismatchByte, generated unrolled loop blocks are like: 0x000000010872addb: mov %r9d,%r11d 0x000000010872adde: add $0x18,%r11d 0x000000010872ade2: movslq %r11d,%r11 ;*i2l ; - arrays.ArrayMismatch::longMismatchAligned@44 (line 61) ; - arrays.ArrayMismatch::fastMismatchByte@11 (line 87) 0x000000010872ade5: mov (%rsi,%r11,1),%rdi 0x000000010872ade9: cmp (%rdx,%r11,1),%rdi 0x000000010872aded: jne 0x000000010872ae98 ;*ifeq ; - arrays.ArrayMismatch::longMismatchAligned@49 (line 62) ; - arrays.ArrayMismatch::fastMismatchByte@11 (line 87)
11-03-2015

This can be coded using Unsafe to perform a vectorized (64-bit-wide) search for the first block of 64 bits containing a mismatch. Pseudocode sketch follows: int vectorizedAlignedMismatch(long[] a64, int i, long[] b64, int j, int count) { // Note: a64/b64 are fictions; what is really needed is an (Object,long) pair for each. // The reads a64[i] are really calls to Unsafe.getLong. final int UNROLL = ...; while (count >= UNROLL) { if (a64[i+0] != b64[j+0]) return i+0; ... if (a64[i+3] != b64[j+3]) return i+3; ... count -= UNROLL; i += UNROLL; j += UNROLL; } while (count >= 1) { if (a64[i] != b64[j]) break; count -= 1; i += 1; j += 1; } return i; } // FIXME: Add misaligned load ops (Unsafe.getLongMisaligned) and use on SPARC or ARM to handle refs to b64 (assuming a64 is aligned). int fastMismatchChar(char[] a, int i, char[] b, int j, int count) { final int EBASE = Unsafe.ARRAY_CHAR_BASE_OFFSET; final int ESIZE = Unsafe.ARRAY_CHAR_INDEX_SCALE; final int VSIZE = Unsafe.ARRAY_LONG_INDEX_SCALE; if (!Unsafe.ANY_LOAD_CAN_BE_MISALIGNED) { if ((((i - j) * ESIZE) % VSIZE) != 0) { return 0; // FIXME: Add Unsafe.getLongMisaligned to avoid this cutout. } while (((i * ESIZE + EBASE) % VSIZE) != 0) { if (a[i] != b[j]) return i; i++; count���; } } // now i and j are aligned if necessary int vcount = count / (VSIZE / ESIZE); int vindex = vectorizedAlignedMismatch((long[])a, i / (VSIZE / ESIZE), (long[])b, j / (VSIZE / ESIZE), count); // Note: index scaling on previous line is bogus, since a64 is a fiction, etc. return vindex / ESIZE; } // FIXME: Recode fastMismatchChar, etc., as generic code using raw offsets and unsafe. One for each size 8/16/32/64.
11-02-2015