The main problem in new RCE code was in case when stride and scale have different sign. In that case new main_loop limit was calculated as
min(old_main_limit, X) where X=(min_int+1 - (offset+1))/scale. With offset 0 and scale == -1, X value is min_int since min_int/-1 = min_int. So the new main loop limit is min_int. New unrolling code correctly handles such case by skipping main loop. Old code did ((limit -init)/stride) with stride == 1 and init == 1 (value after increment in preloop). So new limit for unrolled loop become (min_int-1) == max_int. I solve this by replacing positive (offset+1) with 0 and avoiding underflow in X calculation. It is the same approach as one used in current code for pre_limit in case scale and stride have the same sign.
An other problem with new code is it does not generate the adjusted pre-loop limit when RangeLimitCheck is off. RCE old code did it before.
After thinking more about pre_limit guard in do_range_check() "min(pre_limit, orig_limit)" I decided to always generate it to be safe. The offset and scale in RC condition could have such values that X=(low_limit-offset)/scale will be outside of loop iterations range: max(pre_limit, X) could be > orig_limit.
I also factor out common code, replaced (stride*scale) with logical expressions to avoid overflow.
Tested with/without UnrollLimitCheck CTW, JPRT, nsk.stress, 5091921 tests.