Exposed a number of issues:
1) loop unswitching is needed
2) array length and class pointers are invariant (as far as
the optimizer cares)
3) removal of a number of barriers to iterative application
4) better inlining decisions
It's probably better to wait on 4)inlining until after
the tiered compilation system is on by default since this
will change the profile information seen by c2.
1) make load_klass and load_range immutable
LoadKlass and loadRange operations are not being fully
optimized because their memory input is unnecessarily
Fix is to use immutable memory (memory edge from StartNode)
for loadKlass and loadRange from an object.
2) Fix blockers to optimization
A number of issues in the optimizer were found that throttle
Fix bug in split down of cmp-bool which will infinitely clone
the cmp if the bool has only one use which is not
in the same block as the bool.
Fix bug in sanity check of users' calling convention.
Compute new control for operations created during secondary induction
variable (IV) removal
to prevent inhibitting range check elimination (RCE).
Mark loops that are candidates for RCE to inhibit split-thru-phi
from creating a graph unrecognizable to RCE. And in split-thru-phi
delay splitting through a phi for a marked loop until a later loop
During build_loop_late, try not to place operations on loop entry
control edges because this might inhibit RCE.
Force another round of loop opts if a loop node is
created because a loop node allows more phi node optimizations
which may allow more loop opts.
Reassociate add/sub based on loop invariants.
After peeling, igvn the entire loop since peeling
exposes loop invariant operations.
Enhance iv expression recognition for range check elimination
to include lshift (for scaling) and an offset of: invariant + constant.
Use same RCE pattern matcher for both policy and transform.
Enhance PhiNode Identity check for unnecessary phi merging in the
presence of constraint casts.
Created a LoopTreeIterator for more readible loop visitations.
Print name of field when dumping ideal memory nodes.
Added unique_ctrl_out to return the unique control
output edge if there is one and only one.
Check in MergeMem Ideal if PhiNode::Ideal's "Split phis through memory merges"
transform should be attempted. Look for this->phi->this cycle.
3) Impliment loop unswitching
Need to implement "Loop Unswitching" in order to optimize
the byte buffer loops.
loop if (invariant-test) then
if (invariant-test) then stmt1
stmt4 loop [clone]
endloop stmt1 [clone]
Note: the "else" clause may be empty