JDK-8286679 : more range check elimination for hash tables
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 19
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2022-05-12
  • Updated: 2024-10-05
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Description
There are several hash table lookup idioms which are easily provable NOT to need a range check.  They should all be covered by C2.

Given an array A (or other indexed item such as a list or memory segment), and a quasi-random int or long value X, and an int or long array size L = A.length (or A.size(), etc.), the lookup idiom computes the value I = X mod L (using unsigned arithmetic if needed).   It then immediately uses the reduced value I to index into the array A.  Usually, no range check is needed to ensure that the array (or other aggregate) is correctly indexed.

Case 0.  Native mod operator, int type:   var x = A[X % A.length].  In this case, X must not be negative.  This case is covered by the C2 function CmpUNode::is_index_range_check.  We probably also cover the case of A being a list or other int-indexed aggregate (this should be checked).

Case 1.  Native mod operator, long type:  var x = A.getElement(X % A.elementCount()).  Nowadays we also range-check longs in many places, and should do so here as well.  An additional challenge may appear if A has an internally unscaled bound (such as a byte-wise size for a memory segment) but the array-like access is scald to a multi-byte element.

Case 2.  Strength-reduced mod of power of two:  var x  = A[X & A.length-1].  It is not clear how well this is covered by existing C2 code.  It is a hot path in HashMap, with a native array and int index.  The compiler must (in general) insert a speculative predicate that the length of the array is a non-zero, since (only in this case) the hack will fail (and the range check must always fail).  Oddly enough, the JIT does NOT need to check that the array length is a power of two:  That can be left to the programmer, because of the mathematical identity that an arbitrary number masked by a non-negative number (such as A.length-1) is always numerically less than or equal to the mask, so we can conclude 0 <= (X&(L-1)) < L as long as L>0 (whether or not L is some 2^M).  This should be covered for ints and longs, and all kinds of aggregates.  Many modern hashing structures will benefit from this particular case being well optimized.

Case 3.  Unsigned mod method:  var x  = A[Integer.remainderUnsigned(X, A.length)].In the case of explicit unsigned arithmetic, the RCE can be removed without any predication at all, as long the subsequent range check uses compatible unsigned arithmetic (int or long).  The problem here is recognizing the unsigned remainder operation for what it is (since it is not an intrinsic, and has no special IR node).  This can be done easily in the 32-bit case, since it is coded using 64-bit signed arithmetic.  The implementation of Integer.remainderUnsigned(X,L) is problematic for X<0, so people are not likely to be using it.  Still, some thought should be give to at least preparing to optimize this case.