JDK-6385730 : optimizations for ByteBuffer.put()/get() not nearly as good as byte[] assignment
Type:Bug
Component:hotspot
Sub-Component:compiler
Affected Version:5.0
Priority:P3
Status:Resolved
Resolution:Fixed
OS:generic
CPU:generic
Submitted:2006-02-14
Updated:2013-11-01
Resolved:2006-11-14
The Version table provides details related to the release that this issue/RFE will be addressed.
Unresolved : Release in which this issue/RFE will be addressed. Resolved: Release in which this issue/RFE has been resolved. Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.
ByteBuffer (Heap or Direct) put() and get() along with low level primitives for getting / putting of int, char, long, byte, etc do not get optimized nearly as well as their equivalents using byte[].
Comments
EVALUATION
Fixes
1) make load_klass and load_range immutable
LoadKlass and loadRange operations are not being fully
optimized because their memory input is unnecessarily
constrained.
Fix is to use immutable memory (memory edge from StartNode)
for loadKlass and loadRange from an object.
2) Fix blockers to optimization
A number of issues in the optimizer were found that throttle
existing optimizations.
Fix bug in split down of cmp-bool which will infinitely clone
the cmp if the bool has only one use which is not
in the same block as the bool.
Fix bug in sanity check of users' calling convention.
Compute new control for operations created during secondary induction
variable (IV) removal
to prevent inhibitting range check elimination (RCE).
Mark loops that are candidates for RCE to inhibit split-thru-phi
from creating a graph unrecognizable to RCE. And in split-thru-phi
delay splitting through a phi for a marked loop until a later loop
optimization pass.
During build_loop_late, try not to place operations on loop entry
control edges because this might inhibit RCE.
Force another round of loop opts if a loop node is
created because a loop node allows more phi node optimizations
which may allow more loop opts.
Reassociate add/sub based on loop invariants.
After peeling, igvn the entire loop since peeling
exposes loop invariant operations.
Enhance iv expression recognition for range check elimination
to include lshift (for scaling) and an offset of: invariant + constant.
Use same RCE pattern matcher for both policy and transform.
Enhance PhiNode Identity check for unnecessary phi merging in the
presence of constraint casts.
Created a LoopTreeIterator for more readible loop visitations.
Print name of field when dumping ideal memory nodes.
Added unique_ctrl_out to return the unique control
output edge if there is one and only one.
Check in MergeMem Ideal if PhiNode::Ideal's "Split phis through memory merges"
transform should be attempted. Look for this->phi->this cycle.
3) Impliment loop unswitching
Need to implement "Loop Unswitching" in order to optimize
the byte buffer loops.
orig: transformed:
loop if (invariant-test) then
stmt1 loop
if (invariant-test) then stmt1
stmt2 stmt2
else stmt4
stmt3 endloop
endif else
stmt4 loop [clone]
endloop stmt1 [clone]
stmt3
stmt4 [clone]
endloop
endif
Note: the "else" clause may be empty
EVALUATION
Exposed a number of issues:
1) loop unswitching is needed
2) array length and class pointers are invariant (as far as
the optimizer cares)
3) removal of a number of barriers to iterative application
of optimizations
4) better inlining decisions
It's probably better to wait on 4)inlining until after
the tiered compilation system is on by default since this
will change the profile information seen by c2.