United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-7133038 G1: Some small profile based optimizations
JDK-7133038 : G1: Some small profile based optimizations

Details
Type:
Enhancement
Submit Date:
2012-01-24
Status:
Closed
Updated Date:
2012-03-24
Project Name:
JDK
Resolved Date:
2012-03-24
Component:
hotspot
OS:
generic
Sub-Component:
gc
CPU:
generic
Priority:
P4
Resolution:
Fixed
Affected Versions:
7u4
Fixed Versions:
hs23 (b12)

Related Reports
Backport:
Backport:

Sub Tasks

Description
While looking over some collect/analyze profiles measuring data cache misses, branches, and branch mispredicts some "high" metric items were identified in the following routines:

HeapRegion::oops_on_card_seq_iterate_careful()
* High DC misses when attempting to read the klass of the current object in both loops.
* High number of branch mispredicts in the body of the second loop.

instanceKlass::oop_oop_interate_[*]_nv()
* High number of DC misses while iterating over and de-referencing the reference fields in an object.

G1BlockOffsetArray::forward_to_block_containing_addr_slow()
* High number of DC misses while dereferencing objects during BOT walking.

FilterOutOfRegionClosure::do_oop_nv()
* High number of branches and branch mispredicts.

G1ParCopyHelper::copy_to_survivor_space()
* High number of mispredicts when calculating the object size (coming from size_given_klass).

Proposed changes:

HeapRegion::oops_on_card_seq_iterate_careful()
* High DC misses when attempting to read the klass of the current object in both loops.
  -> Add a prefetch to prefetch the next object after we obtain the size of the current
     object. Adding such a prefetch to second loop looks like the better candidate. I don't
     think that there is enough of a code window between the prefetch in iteration n and use
     in iteration n+1.

* High number of branch mispredicts in the body of the second loop.
  -> The body of the second loop is made up of a 3-way if-statement. The body of two of the
     clauses is the same. If we make the conditional statement "less" branchy then we should
     be able to reduce this.

instanceKlass::oop_oop_interate_[*]_nv()
* High number of DC misses while iterating over and de-referencing the oop maps associated with reference fields in an object.
  -> Simple. Prefeth the next oop map entry.

G1BlockOffsetArray::forward_to_block_containing_addr_slow()
* High number of DC misses while dereferencing objects during BOT walking.
  -> Adding prefetching to these loops is little bit more tricky. We can't add a prefetch after we obtain the size of the current block - there is not enough of code window between the prefetch and the subsequent use. Instead if we use a fixed prefetch amount and issue the prefetch before reading the block size then we might get enough of a code window.

FilterOutOfRegionClosure::do_oop_nv()
* High number of branches and branch mispredicts.
  -> Most of these are coming from the concurrent refinement path way and are coming as a result of calling the virtual do_oop() routine in the closure(s) applied by the FilterOutOfRegionClosure. Using specialization so that the non-virtual _nv version of the do_oop() of these closures is called should help.

G1ParCopyHelper::copy_to_survivor_space()
* High number of mispredicts when calculating the object size (coming from size_given_klass).
  -> It was thought that refactoring and flattening the if-statement in the routine might have given some positive results. After performing such a refactoring and generating the assembly - I don't see any different in the branches in the generated code.

                                    

Comments
EVALUATION

See description.
                                     
2012-01-24
EVALUATION

http://hg.openjdk.java.net/hsx/hotspot-gc/hotspot/rev/b4ebad3520bb
                                     
2012-01-27
EVALUATION

http://hg.openjdk.java.net/lambda/lambda/hotspot/rev/b4ebad3520bb
                                     
2012-03-22



Hardware and Software, Engineered to Work Together