United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6385730 optimizations for ByteBuffer.put()/get() not nearly as good as byte[] assignment
JDK-6385730 : optimizations for ByteBuffer.put()/get() not nearly as good as byte[] assignment

Details
Type:
Bug
Submit Date:
2006-02-14
Status:
Resolved
Updated Date:
2010-05-11
Project Name:
JDK
Resolved Date:
2006-11-14
Component:
hotspot
OS:
generic
Sub-Component:
compiler
CPU:
generic
Priority:
P3
Resolution:
Fixed
Affected Versions:
5.0
Fixed Versions:
hs10 (b03)

Related Reports
Backport:
Backport:
Relates:

Sub Tasks

Description
ByteBuffer (Heap or Direct) put() and get() along with low level primitives for getting / putting of int, char, long, byte, etc do not get optimized nearly as well as their equivalents using byte[].

                                    

Comments
SUGGESTED FIX

http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/c2_baseline/2006/20060830112419.nips.immute/workspace/webrevs/webrev-2006.08.30/index.html
                                     
2006-08-30
SUGGESTED FIX

http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/2006/20060830121702.nips.blockers/workspace/webrevs/webrev-2006.08.30/index.html
                                     
2006-08-30
EVALUATION

Exposed a number of issues:
  1) loop unswitching is needed
  2) array length and class pointers are invariant (as far as
     the optimizer cares)
  3) removal of a number of barriers to iterative application
     of optimizations
  4) better inlining decisions

It's probably better to wait on 4)inlining until after
the tiered compilation system is on by default since this
will change the profile information seen by c2.
                                     
2006-08-30
SUGGESTED FIX

http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/c2_baseline/2006/20060905093858.nips.blockers/workspace/webrevs/webrev-2006.09.05/index.html
                                     
2006-09-05
SUGGESTED FIX

http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/c2_baseline/2006/20060913073259.nips.unswitch/workspace/webrevs/webrev-2006.09.13/index.html
                                     
2006-09-13
EVALUATION

Fixes
1) make load_klass and load_range immutable
   LoadKlass and loadRange operations are not being fully
   optimized because their memory input is unnecessarily
   constrained.
   Fix is to use immutable memory (memory edge from StartNode)
   for loadKlass and loadRange from an object.

2) Fix blockers to optimization
   A number of issues in the optimizer were found that throttle
   existing optimizations.
   Fix bug in split down of cmp-bool which will infinitely clone
     the cmp if the bool has only one use which is not
     in the same block as the bool.
   Fix bug in sanity check of users' calling convention.
   Compute new control for operations created during secondary induction
     variable (IV) removal
     to prevent inhibitting range check elimination (RCE).
   Mark loops that are candidates for RCE to inhibit split-thru-phi
     from creating a graph unrecognizable to RCE.  And in split-thru-phi
     delay splitting through a phi for a marked loop until a later loop
     optimization pass.
   During build_loop_late, try not to place operations on loop entry
     control edges because this might inhibit RCE.
   Force another round of loop opts if a loop node is
     created because a loop node allows more phi node optimizations
     which may allow more loop opts.
   Reassociate add/sub based on loop invariants.
   After peeling, igvn the entire loop since peeling
     exposes loop invariant operations.
   Enhance iv expression recognition for range check elimination
     to include lshift (for scaling) and an offset of: invariant + constant.
   Use same RCE pattern matcher for both policy and transform.
   Enhance PhiNode Identity check for unnecessary phi merging in the
     presence of constraint casts.
   Created a LoopTreeIterator for more readible loop visitations.
   Print name of field when dumping ideal memory nodes.
   Added unique_ctrl_out to return the unique control
     output edge if there is one and only one.
   Check in MergeMem Ideal if PhiNode::Ideal's "Split phis through memory merges"
     transform should be attempted. Look for this->phi->this cycle.

3) Impliment loop unswitching

    Need to implement "Loop Unswitching" in order to optimize
    the byte buffer loops.

   orig:                       transformed:

    loop                         if (invariant-test) then
      stmt1                        loop
      if (invariant-test) then       stmt1
        stmt2                        stmt2
      else                           stmt4
        stmt3                      endloop
      endif                      else
      stmt4                        loop [clone]
    endloop                          stmt1 [clone]
                                     stmt3
                                     stmt4 [clone]
                                   endloop
                                 endif

   Note: the "else" clause may be empty
                                     
2006-09-20



Hardware and Software, Engineered to Work Together