JDK-8042997 : Make intrinsic some or all check index/range methods
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2014-05-13
  • Updated: 2020-11-23
  • Resolved: 2015-11-16
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9 b96Fixed
Related Reports
Blocks :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
Consider the following code:

   static void setVolatile(ArrayRefHandle handle, Object[] array, int index,
           Object value) {
       if (index < 0 || index >= array.length) // bounds and null check
           throw new ArrayIndexOutOfBoundsException();
       UNSAFE.putObjectVolatile(array,
                                (((long) index) << handle.ashift) + handle.abase,
                                castReference(handle.componentType, value));
   }

For such a user written range test the compiler does not fully recognize an array access is going on and so certain optimizations tend not to kick in, such as removing or strength reducing range checks, treating "index" as an unsigned value as opposed to a signed value, or coalescing write barriers when loop unrolling.

In addition to such user written ranges being detected and optimized by the compiler it would be useful to have intrinsic methods such as:

  rangeCheck(Object[]/int[]/long[]... array, int index)

and the optimizer should generate similar code in both cases. 

Ideally the same form of machine code (minus the differences in form of access) should be generated as if an aaload/aastore access was performed.
Comments
No regression test: this affects code generation and so is hard to observe from a java test case.
16-11-2015

The additional API points checkStartEndRange and checkOffsetSizeRange seem very useful just from the perspective of an API since it is really easy to make trivial errors when writing and duplicating. These may be less critical than checkIndex in terms of providing an intrinsic as they are often used to check bounds before performing a bulk operation, however such intrinsics may be useful to hint to the compiler that bounds, represented as a signed ints, should be processed as if unsigned, thereby potentially enabling better loop optimization for a bulk operation. The use-case for checkStartEndRange would be applicable in a number of areas within Arrays, Spliterators and String, but each have their own idiosyncrasies when reporting of errors. In Java.util.Arrays: private static void rangeCheck(int arrayLength, int fromIndex, int toIndex) { if (fromIndex > toIndex) { throw new IllegalArgumentException( "fromIndex(" + fromIndex + ") > toIndex(" + toIndex + ")"); } if (fromIndex < 0) { throw new ArrayIndexOutOfBoundsException(fromIndex); } if (toIndex > arrayLength) { throw new ArrayIndexOutOfBoundsException(toIndex); } } In java.util.Spliterators: private static void checkFromToBounds(int arrayLength, int origin, int fence) { if (origin > fence) { throw new ArrayIndexOutOfBoundsException( "origin(" + origin + ") > fence(" + fence + ")"); } if (origin < 0) { throw new ArrayIndexOutOfBoundsException(origin); } if (fence > arrayLength) { throw new ArrayIndexOutOfBoundsException(fence); } } In String: public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin) { if (srcBegin < 0) { throw new StringIndexOutOfBoundsException(srcBegin); } if (srcEnd > value.length) { throw new StringIndexOutOfBoundsException(srcEnd); } if (srcBegin > srcEnd) { throw new StringIndexOutOfBoundsException(srcEnd - srcBegin); } System.arraycopy(value, srcBegin, dst, dstBegin, srcEnd - srcBegin); } It should be possible to support such idiosyncrasies with a BiFunction that accepts the start and end values. From checking inequalities of those values it can be determined which exception and message to use (the case of end > array.length is implied if the two other inequalities are not applicable). Even the order in which exceptions are thrown can be preserved. However, there are use-cases in AbstractStringBuilder where it would not be possible to preserve the exception messages, for example: public AbstractStringBuilder insert(int index, char[] str, int offset, int len) { if ((index < 0) || (index > length())) throw new StringIndexOutOfBoundsException(index); if ((offset < 0) || (len < 0) || (offset > str.length - len)) throw new StringIndexOutOfBoundsException( "offset " + offset + ", len " + len + ", str.length " + str.length); ensureCapacityInternal(count + len); System.arraycopy(value, index, value, index + len, count - index); System.arraycopy(str, offset, value, index, len); count += len; return this; } In this case the exception message also contains the character array length. The use-case for checkOffsetSizeRange is applicable to NIO buffers for bulk operations: static void checkBounds(int off, int len, int size) { // package-private if ((off | len | (off + len) | (size - (off + len))) < 0) throw new IndexOutOfBoundsException(); } public ByteBuffer get(byte[] dst, int offset, int length) { checkBounds(offset, length, dst.length); ... Another identified java.nio.Buffer: final int checkIndex(int i, int nb) { // package-private if ((i < 0) || (nb > limit - i)) throw new IndexOutOfBoundsException(); return i; } can use Array.checkIndex: final int checkIndex(int i, int nb) { // package-private return Arrays.checkIndex(i, limit - nb + 1, EF); } since it is known limit is non-negative and and nb is positive.
08-09-2015

JDK-8074383 seems to be related to this, visible on String.charAt. Please check if any fix for this issue fixes JDK-8074383 as well.
04-03-2015

Additional API points may be needed for subrange checking. This is sometimes hard to get right. public static int checkStartEndRange(int start, int end, int length) { if (start < 0 || start > end || end > length) throw ... return start; } public static int checkOffsetSizeRange(int offset, int size, int length) { if (offset < 0 || size < 0 || (long)offset + size >= length) throw ... return offset; } Note the 64-bit arithmetic required by the last test, which defends against 32-bit overflow and wrap-to-negative. This doesn't happen in practice, but could happen with maliciously concocted inputs. The last test should be replaced by 32-bit arithmetic, so that the algorithm will generalize cleanly to long indexes. Here's another cut: public static int checkOffsetSizeRange(int offset, int size, int length) { assert(length >= 0); if ((offset | size) < 0 || offset > length - size) throw ... return offset; } Such tests are very difficult to get right the first time, and to optimize correctly. That is one reason why we should have an API rather than relying on the hand-crafted logic written by the writers of sequence types.
17-02-2015

An exception factory which captures no variables (usual case) will just be an LDC (or equivalent INDY), so I don't think there is a performance problem there. I agree that boolean versions of the tests are attractive. I am proposing the "test and throw" all-in-one versions in order to minimize the resulting code size of clients. Bytecode-size is (unfortunately) inversely correlated with optimization quality, until we revamp inlining intrinsics. Also relieving the client of writing explicit control flow makes for less buggy client code. Of course, the API needs to prove its worth by allowing retrofitting of a large fraction of existing range-check code.
17-02-2015

I would be a little concerned about the exception factory approach since this functionality is likely to be used in performance sensitive areas (perhaps even in cases where there are dependency cycles with lambdas). While one could use static final instances it might be easy to forgot and they are a hassle. Some constants could be provided to mitigate that hassle e.g. perhaps on ArrayIndexOutOfBoundsException and IndexOutOfBoundsException. Why not start with the more fundamental: public static boolean check(int index, int length) { return index < 0 || index >= length; } and build up functionality from that? or am i missing something fundamental about how the JIT might optimize? Existing code can be easily be transformed from: if ((i < 0) || (i >= limit)) throw new IndexOutOfBoundsException(); to if (check(i, limit) throw new IndexOutOfBoundsException();
13-02-2015

An intrinsic might look like this: public static int checkIndex(int index, int length) { if (index < 0 || index >= length) throw checkIndexFailed(index, length); return index; } private static RuntimeException checkIndexFailed(int index, int length) { return new ArrayIndexOutOfBoundsException(index); } It could go in Unsafe (easy for the JVM devs) or with a CCC request it could be proposed for java.util.Arrays. There should be a long overloading also, since Arrays 2.0 arrays will support long indexes. Having it return the index makes it easier to use (in an expression) and perhaps easier for the JIT to track the constrained type of the result. The type of the exception (and its message) is likely to be problematic. To address this, supply an exception factory: public static int checkIndex(int index, int length, BiFunction<Long, Long, IndexOutOfBoundsException> exceptionFactory) { if (index < 0 || index >= length) throw exceptionFactory.apply(index, length); return index; } Note that the "intrinsic" has single failure block which has no successors. It's not enough just to call a utility function that throws; the block has to terminate with an "athrow" bytecode.
17-01-2015

The webrev looks reasonable, and make allow us to make progress even without an intrinsic. The change in parse2.cpp is kind of hack-y, and might interfere with other uses of branch profiling. The goal is to preserve equality of control paths, by allowing a multiple-predecessor blocks to be parsed even if the in-edges to it are of frequency is very low (or zero). Example: B1: if (val < 0) goto L_slowpath; // profile says never taken B2: if (val >= limit) goto L_slowpath; // profile says never taken L_fastpath: B3: do_fast_computation(val); return; L_slowpath: B4: raise_an_error(); Currently, we emit two uncommon traps inside B1 and B2. The uncommon traps are different; each "blames" the trap on the "if" statement, rather than on the target block B4, and replays the "if" statement. It would be better to branch to B4, and then emit a single uncommon trap (or whatever else makes sense for a low-frequency block). The reason it is better is that the code in IfNode::fold_compares will only turn the two CmpI from B1 and B2 into a single CmpU if it sees a common RegionNode for B4, accepting both branches. Tweaking the branch probabilities in Vladimir's prototype gets this effect. Another way to get the same effect, without perturbing the branch frequencies, would be to "blame" the uncommon trap on the successor block B4. In other words, rather than replay the "if" statement, advance the BCI (and keep the if-arguments popped) and enter the interpreter at B4. The parsing of B1 would create an uncommon trap associated with B4, which would be recorded (somehow). Next, the parsing of B2 would try to reuse the uncommon trap. Reuse would fail if any phi needed to be created. Note the logic in IfNode stipulates the the region (B4) be free of phis. Sometimes we create phis for different versions of CastII nodes stemming from the original value; this could happen with B1 which proves that val is non-negative. We would have to discard this refined type information when choosing whether to reuse an uncommon trap at B4, since in that case B4 is receiving different versions of 'val'. It probably helps if 'val' is dead at B4, but that might not be the case; 'val' might be passed to an error-reporting utility. Summary: I suggest that we investigate reusing uncommon traps, especially those that arise from low-level safety checks like range checks.
17-01-2015

Webrev for above diffs: http://cr.openjdk.java.net/~kvn/8042997/webrev/
02-12-2014

Yes, intrinsics are good. John R. also proposed to use intrinsics for range check in java.
02-12-2014

In addition to detecting the "if (index < 0 || index >= array.length)" pattern would it still be appropriate to say intrinsify Integer/Long.compareUnsigned? For example code, consider the following method in Buffer: final int checkIndex(int i) { // package-private if ((i < 0) || (i >= limit)) throw new IndexOutOfBoundsException(); return i; } Can the compiler also detect this case? If not changing the bounds check to: // limit is a positive number if (Integer.compareUnsigned(i, limit) >= 0) throw new IndexOutOfBoundsException(); might work with corresponding intrinsification.
16-10-2014

Next changes will folds checks guarding throw into range check: 030 B2: # B4 B3 <- B1 Freq: 0.999999 030 cmpl RCX, R10 # unsigned 033 jnb,us B4 P=0.000001 C=6700.000000 033 diff -r f111958ca117 src/share/vm/ci/ciTypeFlow.hpp --- a/src/share/vm/ci/ciTypeFlow.hpp Fri Sep 19 17:14:13 2014 +0200 +++ b/src/share/vm/ci/ciTypeFlow.hpp Thu Oct 09 17:15:11 2014 -0700 @@ -582,6 +582,7 @@ int start() const { return _ciblock->start_bci(); } int limit() const { return _ciblock->limit_bci(); } int control() const { return _ciblock->control_bci(); } + bool may_throw() const { return _ciblock->may_throw(); } JsrSet* jsrs() const { return _jsrs; } bool is_backedge_copy() const { return _backedge_copy; } diff -r f111958ca117 src/share/vm/opto/ifnode.cpp --- a/src/share/vm/opto/ifnode.cpp Fri Sep 19 17:14:13 2014 +0200 +++ b/src/share/vm/opto/ifnode.cpp Thu Oct 09 17:15:11 2014 -0700 @@ -677,11 +677,11 @@ Node* this_cmp = in(1)->in(1); if (this_cmp != NULL && this_cmp->Opcode() == Op_CmpI && - this_cmp->in(2)->is_Con() && this_cmp->in(2) != phase->C->top()) { + this_cmp->in(2) != phase->C->top()) { Node* ctrl = in(0); BoolNode* this_bool = in(1)->as_Bool(); Node* n = this_cmp->in(1); - int hi = this_cmp->in(2)->get_int(); + Node* hi = this_cmp->in(2); if (ctrl != NULL && ctrl->is_Proj() && ctrl->outcnt() == 1 && ctrl->in(0)->is_If() && ctrl->in(0)->outcnt() == 2 && @@ -724,6 +724,19 @@ if (failtype != NULL && dom_bool->_test._test != BoolTest::ne && dom_bool->_test._test != BoolTest::eq) { int bound = failtype->_hi - failtype->_lo + 1; + int hi_lo = phase->type(hi)->is_int()->_lo; + if (!hi->is_Con() && lo >= 0 && hi_lo >= lo && bound > 1) { + // Merge the two compares into a single unsigned compare by building (CmpU (n - lo) hi) + BoolTest::mask cond = fail->as_Proj()->_con ? BoolTest::lt : BoolTest::ge; + Node* adjusted_val = phase->transform(new SubINode(n, phase->intcon(lo))); + Node* adjusted_lim = phase->transform(new SubINode(hi, phase->intcon(lo))); + Node* newcmp = phase->transform(new CmpUNode(adjusted_val, adjusted_lim)); + Node* newbool = phase->transform(new BoolNode(newcmp, cond)); + phase->is_IterGVN()->replace_input_of(dom_iff, 1, phase->intcon(ctrl->as_Proj()->_con)); + phase->hash_delete(this); + set_req(1, newbool); + return this; + } - if (failtype->_hi != max_jint && failtype->_lo != min_jint && bound > 1) { + if (hi->is_Con() && failtype->_hi != max_jint && failtype->_lo != min_jint && bound > 1) { // Merge the two compares into a single unsigned compare by building (CmpU (n - lo) hi) diff -r f111958ca117 src/share/vm/opto/parse.hpp --- a/src/share/vm/opto/parse.hpp Fri Sep 19 17:14:13 2014 +0200 +++ b/src/share/vm/opto/parse.hpp Thu Oct 09 17:15:11 2014 -0700 @@ -244,6 +244,9 @@ bool has_trap_at(int bci) const { return flow()->has_trap() && flow()->trap_bci() == bci; } + bool has_trap() const { return flow()->has_trap(); } + bool may_throw() const { return flow()->may_throw(); } + // Call this just before parsing a block. void mark_parsed() { assert(!_is_parsed, "must parse each block exactly once"); diff -r f111958ca117 src/share/vm/opto/parse2.cpp @@ -1016,6 +1019,23 @@ float prob = branch_prediction(cnt, btest, target_bci); float untaken_prob = 1.0 - prob; + // Generate branch to block which trows exception or traps + // instead of generating separate uncommon traps for each + // such branch. + // It may help to convert several checks into one range check. + if ((branch_block->may_throw() || branch_block->has_trap()) && + (branch_block->num_successors() == 0) && + (prob < PROB_MIN)) { // exit block + prob = PROB_MIN; + untaken_prob = 1.0 - prob; + } + if ((next_block->may_throw() || next_block->has_trap()) && + (next_block->num_successors() == 0) && + (untaken_prob < PROB_MIN)) { // exit block + untaken_prob = PROB_MIN; + prob = 1.0 - untaken_prob; + } + if (prob == PROB_UNKNOWN) { #ifndef PRODUCT if (PrintOpto && Verbose)
10-10-2014

Unfortunately generated ideal graph does not match the pattern: // If // / | // / | // / | // If | // /\ | // / \ | // / \ | // / Region We generate separate Uncommon traps instead of one 'throw'. As result we don't have merge point (Region node).
09-10-2014

And we don't optimize checks as the bug said: 02c testl RCX, RCX 02e jl,s B5 P=0.000000 C=6700.000000 02e 030 B2: # B7 B3 <- B1 Freq: 1 030 movl RBP, [RDX + #12 (8-bit)] # range 033 NullCheck RDX 033 033 B3: # B6 B4 <- B2 Freq: 0.999999 033 cmpl RCX, RBP 035 jge,s B6 P=0.000000 C=6700.000000 035 037 B4: # N63 <- B3 Freq: 0.999998 037 movslq R10, RCX # i2l 03a movl RAX, [RDX + #16 + R10 << #2] # int I think it is because it is separated by null check (converted to implicit) for array's length load. But even when I move the load up it still does not work: int m = a.length; if (index < 0 || index >= m) 02c movl R10, [RDX + #12 (8-bit)] # range 030 NullCheck RDX 030 030 B2: # B5 B3 <- B1 Freq: 0.999999 030 testl RCX, RCX 032 jl,s B5 P=0.000000 C=6700.000000 032 034 B3: # B6 B4 <- B2 Freq: 0.999999 034 cmpl RCX, R10 037 jge,s B6 P=0.000000 C=6700.000000 037
09-10-2014

if (index < 0 || index >= array.length) pattern should be optimized to if (!(index u< array.length)). The code which does such optimization is IfNode::fold_compares() method.
08-10-2014

Relevant discussion on hotspot-compiler-dev: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-May/014365.html
13-05-2014